RPubs Link: https://rpubs.com/vedant_1997/IS415_Take-home_Ex04
Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.
Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method.
Prepare a choropleth map showing the distribution of the residual of the GDP per capita.
Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.
Prepare a series of choropleth maps showing the outputs of the geographically weighted regression model.
Brazil Cities CSV
Brazil Municipality polygons
Exploring available datasets in the library for the Municipality related data
## # A tibble: 22 x 4
## `function` geography years source
## <chr> <chr> <chr> <chr>
## 1 `read_country` Country 1872, 1900, 1911, 1920, 1933, ~ IBGE
## 2 `read_region` Region 2000, 2001, 2010, 2013, 2014, ~ IBGE
## 3 `read_state` States 1872, 1900, 1911, 1920, 1933, ~ IBGE
## 4 `read_meso_regi~ Meso region 2000, 2001, 2010, 2013, 2014, ~ IBGE
## 5 `read_micro_reg~ Micro region 2000, 2001, 2010, 2013, 2014, ~ IBGE
## 6 `read_intermedi~ Intermediate region 2017 IBGE
## 7 `read_immediate~ Immediate region 2017 IBGE
## 8 `read_municipal~ Municipality 1872, 1900, 1911, 1920, 1933, ~ IBGE
## 9 `read_weighting~ Census weighting are~ 2010 IBGE
## 10 `read_census_tr~ Census tract (setor ~ 2000, 2010 IBGE
## # ... with 12 more rows
Due to an SSL error, the file was saved into a shapefile using st_write in order to perform my analysis.
municipalities <- read_municipality(code_muni = "all", year = 2016)
my_sf <- st_as_sf(municipalities)
class(municipalities)
st_write(municipalities, dsn = 'data/geospatial', layer = 'municipalities', driver = 'ESRI Shapefile')## Reading layer `municipalities' from data source `D:\Geospatial Analytics\Take_home\Take-home_Ex04\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 5572 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 5.271841
## proj4string: +proj=longlat +ellps=GRS80 +no_defs
Make sure that the geometry is valid
Check for any duplicates of municipalities
## [1] FALSE
For this transformation, we used an EPSG value of 5641 which is used for Latin America region.
Using the read_delim() function as the values are separated by “;”
## # A tibble: 6 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abad~ GO 0 6876 6876 0 2137
## 2 Abad~ MG 0 6704 6704 0 2328
## 3 Abad~ GO 0 15757 15609 148 4655
## 4 Abae~ MG 0 22690 22690 0 7694
## 5 Abae~ PA 0 141100 141040 60 31061
## 6 Abai~ CE 0 10496 10496 0 2791
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5573 Length:5573 Min. :0.000000 Min. : 805
## Class :character Class :character 1st Qu.:0.000000 1st Qu.: 5235
## Mode :character Mode :character Median :0.000000 Median : 10934
## Mean :0.004845 Mean : 34278
## 3rd Qu.:0.000000 3rd Qu.: 23424
## Max. :1.000000 Max. :11253503
## NA's :8
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.0 Min. : 239 Min. : 60
## 1st Qu.: 5230 1st Qu.: 0.0 1st Qu.: 1572 1st Qu.: 874
## Median : 10926 Median : 0.0 Median : 3174 Median : 1846
## Mean : 34200 Mean : 77.5 Mean : 10303 Mean : 8859
## 3rd Qu.: 23390 3rd Qu.: 10.0 3rd Qu.: 6726 3rd Qu.: 4624
## Max. :11133776 Max. :119727.0 Max. :3576148 Max. :3548433
## NA's :8 NA's :8 NA's :10 NA's :10
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 487 1st Qu.: 2801 1st Qu.: 38.0 1st Qu.: 158
## Median : 931 Median : 6170 Median : 92.0 Median : 376
## Mean : 1463 Mean : 27595 Mean : 383.3 Mean : 1544
## 3rd Qu.: 1832 3rd Qu.: 15302 3rd Qu.: 232.0 3rd Qu.: 951
## Max. :33809 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :81 NA's :8 NA's :8 NA's :8
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 259 1st Qu.: 1734 1st Qu.: 341
## Median : 516 Median : 588 Median : 3841 Median : 722
## Mean : 2069 Mean : 2381 Mean : 18212 Mean : 3004
## 3rd Qu.: 1300 3rd Qu.: 1478 3rd Qu.: 9628 3rd Qu.: 1724
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :8 NA's :8 NA's :8 NA's :8
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2326 1st Qu.:1392 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2783 Median :0.6650
## Mean : 14179.9 Mean : 57384 Mean :2783 Mean :0.6592
## 3rd Qu.: 11194.2 3rd Qu.: 55619 3rd Qu.:4174 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :3 NA's :3 NA's :8 NA's :8
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.40
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :8 NA's :8 NA's :8 NA's :9
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1 Min. : 3
## 1st Qu.:-22.838 1st Qu.: 169.8 1st Qu.: 88 1st Qu.: 119
## Median :-18.089 Median : 406.5 Median : 247 Median : 327
## Mean :-16.444 Mean : 893.8 Mean : 3094 Mean : 6567
## 3rd Qu.: -8.489 3rd Qu.: 628.9 3rd Qu.: 815 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668 Max. :5543127
## NA's :9 NA's :9 NA's :3 NA's :3
## AREA REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## Min. : 3.57 Length:5573 Length:5573 Min. : 786
## 1st Qu.: 204.44 Class :character Class :character 1st Qu.: 5454
## Median : 416.59 Mode :character Mode :character Median : 11590
## Mean : 1517.44 Mean : 37432
## 3rd Qu.: 1026.57 3rd Qu.: 25296
## Max. :159533.33 Max. :12176866
## NA's :3 NA's :3
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## Length:5573 Min. : 0 Min. : 1 Min. : 2
## Class :character 1st Qu.: 4189 1st Qu.: 1726 1st Qu.: 10112
## Mode :character Median : 20426 Median : 7424 Median : 31211
## Mean : 47271 Mean : 175928 Mean : 489451
## 3rd Qu.: 51227 3rd Qu.: 41022 3rd Qu.: 115406
## Max. :1402282 Max. :63306755 Max. :464656988
## NA's :3 NA's :3 NA's :3
## GVA_PUBLIC GVA_TOTAL TAXES GDP
## Min. : 7 Min. : 17 Min. : -14159 Min. : 15
## 1st Qu.: 17267 1st Qu.: 42253 1st Qu.: 1305 1st Qu.: 43709
## Median : 35866 Median : 119492 Median : 5100 Median : 125153
## Mean : 123768 Mean : 832987 Mean : 118864 Mean : 954584
## 3rd Qu.: 89245 3rd Qu.: 313963 3rd Qu.: 22197 3rd Qu.: 329539
## Max. :41902893 Max. :569910503 Max. :117125387 Max. :687035890
## NA's :3 NA's :3 NA's :3 NA's :3
## POP_GDP GDP_CAPITA GVA_MAIN MUN_EXPENDIT
## Min. : 815 Min. : 3191 Length:5573 Min. :1.421e+06
## 1st Qu.: 5483 1st Qu.: 9058 Class :character 1st Qu.:1.573e+07
## Median : 11578 Median : 15870 Mode :character Median :2.746e+07
## Mean : 36998 Mean : 21126 Mean :1.043e+08
## 3rd Qu.: 25085 3rd Qu.: 26155 3rd Qu.:5.666e+07
## Max. :12038175 Max. :314638 Max. :4.577e+10
## NA's :3 NA's :3 NA's :1492
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 162.0 Median : 2.00 Median : 0.000 Median : 11.00
## Mean : 906.8 Mean : 18.25 Mean : 1.852 Mean : 73.44
## 3rd Qu.: 448.0 3rd Qu.: 8.00 3rd Qu.: 2.000 3rd Qu.: 39.00
## Max. :530446.0 Max. :1948.00 Max. :274.000 Max. :31566.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 1.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 32.0
## Median : 0.0000 Median : 0.000 Median : 4.00 Median : 74.5
## Mean : 0.4262 Mean : 2.029 Mean : 43.26 Mean : 348.0
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00 3rd Qu.: 199.0
## Max. :332.0000 Max. :657.000 Max. :25222.00 Max. :150633.0
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41 Mean : 55.88 Mean : 24.74 Mean : 15.55
## 3rd Qu.: 25 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515 Max. :29290.00 Max. :38720.00 Max. :23738.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 0.00 Median : 4.00 Median : 4.0 Median : 2.000
## Mean : 15.14 Mean : 51.29 Mean : 83.7 Mean : 3.269
## 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.0 3rd Qu.: 3.000
## Max. :14003.00 Max. :49181.00 Max. :76757.0 Max. :204.000
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_P COMP_Q COMP_R COMP_S
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.: 5.00
## Median : 6.00 Median : 3.00 Median : 2.00 Median : 12.00
## Mean : 30.96 Mean : 34.15 Mean : 12.18 Mean : 51.61
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00 3rd Qu.: 31.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00 Max. :24832.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_T COMP_U HOTELS BEDS
## Min. :0 Min. : 0.00000 Min. : 1.000 Min. : 2.0
## 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000 1st Qu.: 40.0
## Median :0 Median : 0.00000 Median : 1.000 Median : 82.0
## Mean :0 Mean : 0.05027 Mean : 3.131 Mean : 257.5
## 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000 3rd Qu.: 200.0
## Max. :0 Max. :123.00000 Max. :97.000 Max. :13247.0
## NA's :3 NA's :3 NA's :4686 NA's :4686
## Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.00
## 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000 1st Qu.:1.00
## Median : 1.000 Median : 2.000 Median : 1.000 Median :2.00
## Mean : 3.383 Mean : 2.829 Mean : 1.312 Mean :1.58
## 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.:2.00
## Max. :1693.000 Max. :626.000 Max. :83.000 Max. :8.00
## NA's :2231 NA's :2231 NA's :2231 NA's :2231
## Pr_Assets Pu_Assets Cars Motorcycles
## Min. :0.000e+00 Min. :0.000e+00 Min. : 2 Min. : 4
## 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602 1st Qu.: 591
## Median :3.231e+07 Median :1.339e+08 Median : 1438 Median : 1285
## Mean :9.180e+09 Mean :6.005e+09 Mean : 9859 Mean : 4879
## 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4086 3rd Qu.: 3294
## Max. :1.947e+13 Max. :8.016e+12 Max. :5740995 Max. :1134570
## NA's :2231 NA's :2231 NA's :11 NA's :11
## Wheeled_tractor UBER MAC WAL-MART
## Min. : 0.000 Min. :1 Min. : 1.000 Min. : 1.000
## 1st Qu.: 0.000 1st Qu.:1 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 0.000 Median :1 Median : 2.000 Median : 1.000
## Mean : 5.754 Mean :1 Mean : 4.277 Mean : 2.059
## 3rd Qu.: 1.000 3rd Qu.:1 3rd Qu.: 3.000 3rd Qu.: 1.750
## Max. :3236.000 Max. :1 Max. :130.000 Max. :26.000
## NA's :11 NA's :5448 NA's :5407 NA's :5471
## POST_OFFICES
## Min. : 1.000
## 1st Qu.: 1.000
## Median : 1.000
## Mean : 2.081
## 3rd Qu.: 2.000
## Max. :225.000
## NA's :120
From the above summary it is observed that 9 rows are missing values for the LONG and LAT columns. Based on further research, it is gathered that the data available online is not in the same projection as the LONG and LAT columns in the dataset. Thus it would be better to remove the 9 rows to avoid any geographical inaccuracies.
The metadata gives us an understanding of the data we are dealing with from the above file.
## ï..FIELD.DESCRIPTION.REFERENCE.UNIT.SOURCE.
## 1 CITY;Name of the City;;;-;
## 2 STATE;Name of the State;;;-;
## 3 CAPITAL;1 if Capital of State;;;-;
## 4 IBGE_RES_POP;Resident Population ;2010;-;https://sidra.ibge.gov.br/tabela/1497;
## 5 IBGE_RES_POP_BRAS;Resident Population Brazilian;2010;-;https://sidra.ibge.gov.br/tabela/1497;
## 6 IBGE_RES_POP_ESTR;Redident Population Foreigners;2010;-;https://sidra.ibge.gov.br/tabela/1497;
## 7 IBGE_DU;Domestic Units Total ;2010;-;https://sidra.ibge.gov.br/tabela/3495;
## 8 IBGE_DU_URBAN;Domestic Units Urban ;2010;-;https://sidra.ibge.gov.br/tabela/3495;
## 9 IBGE_DU_RURAL;Domestic Units Rural;2010;-;https://sidra.ibge.gov.br/tabela/3495;
## 10 IBGE_POP;Resident Population Regular Urban Planning;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 11 IBGE_1;Resident Population Regular Urban Planning - until 1 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 12 IBGE_1-4;Resident Population Regular Urban Planning - from 1 to 4 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 13 IBGE_5-9;Resident Population Regular Urban Planning - from 4 to 9 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 14 IBGE_10-14;Resident Population Regular Urban Planning - from 10 to 14 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 15 IBGE_15-59;Resident Population Regular Urban Planning - from 15 to 59 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 16 IBGE_60+;Resident Population Regular Urban Planning - above 60 y.o;2010;-;https://sidra.ibge.gov.br/tabela/3365;
## 17 IBGE_PLANTED_AREA;Planted Area (hectares) ;2017;1 hectare (1 hectare = 10,000 square meters);https://sidra.ibge.gov.br/tabela/5457;
## 18 IBGE_CROP_PRODUCTION_$;Crop Production;2017;$ 1,000 reais;https://sidra.ibge.gov.br/tabela/5457;
## 19 IDHM Ranking;HDI Ranking;2010;-;http://www.br.undp.org/content/brazil/pt/home/idh0.html;
## 20 IDHM;HDI Human Development Index;2010;-;http://www.br.undp.org/content/brazil/pt/home/idh0.html;
## 21 IDHM_Renda;HDI GNI Index;2010;-;http://www.br.undp.org/content/brazil/pt/home/idh0/rankings/idhm-municipios-2010.html;
## 22 IDHM_Longevidade;HDI Life Expectancy index;2010;-;http://www.br.undp.org/content/brazil/pt/home/idh0/rankings/idhm-municipios-2010.html;
## 23 IDHM_Educacao;HDI Education index;2010;-;http://www.br.undp.org/content/brazil/pt/home/idh0/rankings/idhm-municipios-2010.html;
## 24 LONG;City Latitude ;2010;-;ftp://geoftp.ibge.gov.br/organizacao_do_territorio/estrutura_territorial/localidades;
## 25 LAT;City Longitude ;2010;-;ftp://geoftp.ibge.gov.br/organizacao_do_territorio/estrutura_territorial/localidades;
## 26 ALT;City Elevation (meters);2010;1 meter;ftp://geoftp.ibge.gov.br/organizacao_do_territorio/estrutura_territorial/localidades;
## 27 PAY_TV;PayTV users;2019-03;-;https://cloud.anatel.gov.br/index.php/s/TpaFAwSw7RPfBa8?path=%2FTV_por_Assinatura%2FPor_Municipio;
## 28 FIXED_PHONES;Fixed Fones (not cell phones) users;2019-03;-;https://cloud.anatel.gov.br/index.php/s/TpaFAwSw7RPfBa8?path=%2FTelefonia_Fixa%2FPor_Municipio;
## 29 AREA;City area (squared kilometers);2018;1 squared Kilometer (1 kilometer = 1,000,000 square meters);https://www.ibge.gov.br/geociencias/organizacao-do-territorio/estrutura-territorial/15761-areas-dos-municipios.html?t=acesso-ao-produto&c=1;
## 30 REGIAO_TUR;Turism Category Region;2017;-;http://dados.turismo.gov.br/mapa-do-turismo-brasileiro;
## 31 CATEGORIA_TUR;Turism Category ;2017;-;http://dados.turismo.gov.br/mapa-do-turismo-brasileiro;
## 32 ESTIMATED_POP;Estimated Population;2018-07;-;https://www.ibge.gov.br/estatisticas/sociais/populacao/9103-estimativas-de-populacao.html?=&t=o-que-e;
## 33 RURAL_URBAN;Rural or Urban Tipology;2016;-;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 34 GVA_AGROPEC;Gross Added Value - Agropecuary;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 35 GVA_INDUSTRY;Gross Added Value - Industry;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 36 GVA_SERVICES;Gross Added Value - Services;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 37 GVA_PUBLIC;Gross Added Value - Public Services;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 38 GVA_TOTAL;Total Gross Added Value;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 39 TAXES;Taxes;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 40 GDP;Gross Domestic Product;2016;$ 1,000 reais;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 41 POP_GDP;Population;2016;-;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 42 GDP_CAPITA;Gross Domestic Product per capita;2016;-;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;
## 43 GVA_MAIN;Activity with higher GVA contribution;2016;-;https://www.ibge.gov.br/estatisticas/economicas/contas-nacionais/9088-produto-interno-bruto-dos-municipios.html?t=downloads;Para as variáveis com as maiores atividades econômicas foram consideradas:\nAgricultura, inclusive apoio à agricultura e a pós colheita; \nPecuária, inclusive apoio à pecuária;\nProdução florestal, pesca e aquicultura; \nIndústrias extrativas; \nIndústrias de transformação; \nEletricidade e gás, água, esgoto, atividades de gestão de resÃduos e descontaminação;\nConstrução;\nComércio e reparação de veÃculos automotores e motocicletas; \nAdministração, defesa, educação e saúde públicas e seguridade social; e\nDemais serviços. \n\nA classe Demais serviços compreende a agregação dos setores: \nTransporte, armazenagem e correio; \nAlojamento e alimentação; \nInformação e comunicação; \nAtividades financeiras, de seguros e serviços relacionados; \nAtividades imobiliárias; \nAtividades profissionais, cientÃficas e técnicas, administrativas e serviços complementares;\nEducação e saúde privadas; \nArtes, cultura, esporte e recreação e outras atividades de serviços e serviços domésticos.
## 44 MUN_EXPENDIT;Municipal expenditures - in reais;2016;$ 1 real;http://www.tesourotransparente.gov.br/ckan/dataset/dcam;
## 45 COMP_TOT;Total number of companies;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 46 COMP_A;Number of Companies: Agriculture, livestock, forestry, fishing and aquaculture;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 47 COMP_B;Number of Companies: Extractive industries;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 48 COMP_C;Number of Companies: Industries of transformation;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 49 COMP_D;Number of Companies: Electricity and gas;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 50 COMP_E;Number of Companies: Water, sewage, waste management and decontamination activities;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 51 COMP_F;Number of Companies: Construction;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 52 COMP_G;Number of Companies: Trade; repair of motor vehicles and motorcycles;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 53 COMP_H;Number of Companies: Transport, storage and mail;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 54 COMP_I;Number of Companies: Accommodation and food;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 55 COMP_J;Number of Companies: Information and communication;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 56 COMP_K;Number of Companies: Financial, insurance and related services activities;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 57 COMP_L;Number of Companies: Real estate activities;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 58 COMP_M;Number of Companies: Professional, scientific and technical activities;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 59 COMP_N;Number of Companies: Administrative activities and complementary services;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 60 COMP_O;Number of Companies: Public administration, defense and social security;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 61 COMP_P;Number of Companies: Education;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 62 COMP_Q;Number of Companies: Human health and social services;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 63 COMP_R;Number of Companies: Arts, culture, sport and recreation;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 64 COMP_S;Number of Companies: Other service activities;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 65 COMP_T;Number of Companies: Domestic services;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 66 COMP_U;Number of Companies: International and other extraterritorial institutions;2016;-;https://sidra.ibge.gov.br/tabela/993;
## 67 HOTELS;Total number of hotels;2019-03;-;http://dados.turismo.gov.br/cadastur;
## 68 BEDS;Toal number of hotel beds;2019-03;-;http://dados.turismo.gov.br/cadastur;
## 69 Pr_Agencies;Total number of private bank agencies;2019-02;-;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 70 Pu_Agencies;Total number of public bank agencies;2019-02;-;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 71 Pr_Bank;Total number of private banks;2019-02;-;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 72 Pu_Bank;Total number of public banks;2019-02;-;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 73 Pr_Assets;Total amount of private bank assets;2019-02;$ 1 real;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 74 Pu_Assets;Total amount of public bank assets;2019-02;$ 1 real;https://www.bcb.gov.br/estatisticas/estatisticabancariamunicipios;
## 75 Cars;Total number of cars;2019-01;-;https://www.denatran.gov.br/estatistica/639-frota-2019;
## 76 Motorcycles;Total number of motorcycles, scooters, moped;2019-01;-;https://www.denatran.gov.br/estatistica/639-frota-2019;
## 77 Wheeled_tractor;Total number of wheeled tractors;2019-01;-;https://www.denatran.gov.br/estatistica/639-frota-2019;
## 78 UBER;1 if UBER ;2019-05;-;https://www.uber.com/en-BR/cities/;
## 79 MAC;Total number of Mac Donalds stores;2018-11;-;https://www.mcdonalds.com.br/enderecos;
## 80 WALLMART;Total number of Walmart Stores;2018-12;-;https://tabloide.walmartbrasil.com.br/;
## 81 POST_OFFICES;Total number of post offices;2019-05;-;http://www2.correios.com.br/sistemas/agencias/;
## 82 ;;;;;
## 83 ;;;;;
## 84 ;;;;;
## 85 ;;;;;
## 86 ;;;;;
## 87 ;;;;;
## 88 ;;;;;
## 89 ;;;;;
## 90 ;;;;;
## 91 ;;;;;
## 92 ;;;;;
## 93 ;;;;;
## 94 ;;;;;
## 95 ;;;;;
## 96 ;;;;;
Check for missing coordinates and removing the corresponding rows
## [1] TRUE
## [1] TRUE
brazil_cities1 <- brazil_cities %>%
drop_na("LONG") %>%
drop_na("LAT")
any(is.na(brazil_cities1$LONG))## [1] FALSE
## [1] FALSE
We need to transform the data frame into an sf data frame and project it on the same CRS as the municipalities sf for Brazil
brazil_cities_sf <- st_as_sf(brazil_cities1,
coords = c("LONG", "LAT"),
crs = 4674) %>%
st_transform(crs = 5641)
brazil_cities_sf## Simple feature collection with 5564 features and 79 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1671725 ymin: 6039171 xmax: 6175358 ymax: 10507270
## CRS: EPSG:5641
## # A tibble: 5,564 x 80
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abad~ GO 0 6876 6876 0 2137
## 2 Abad~ MG 0 6704 6704 0 2328
## 3 Abad~ GO 0 15757 15609 148 4655
## 4 Abae~ MG 0 22690 22690 0 7694
## 5 Abae~ PA 0 141100 141040 60 31061
## 6 Abai~ CE 0 10496 10496 0 2791
## 7 Abaí~ BA 0 8316 8316 0 2572
## 8 Abaré BA 0 17064 17064 0 4332
## 9 Abat~ PR 0 7764 7764 0 2499
## 10 Abdo~ SC 0 2653 2653 0 848
## # ... with 5,554 more rows, and 73 more variables: IBGE_DU_URBAN <dbl>,
## # IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>,
## # `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## # AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>, ESTIMATED_POP <dbl>,
## # RURAL_URBAN <chr>, GVA_AGROPEC <dbl>, GVA_INDUSTRY <dbl>,
## # GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL ` <dbl>, TAXES <dbl>,
## # GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>, GVA_MAIN <chr>,
## # MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>, COMP_B <dbl>,
## # COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>, COMP_G <dbl>,
## # COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>, COMP_L <dbl>,
## # COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>, COMP_Q <dbl>,
## # COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>, HOTELS <dbl>,
## # BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>, Pr_Bank <dbl>,
## # Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>, geometry <POINT [m]>
We will be using the st_join() function for this step.
## Rows: 5,572
## Columns: 84
## $ code_mn <dbl> 1100015, 1100023, 1100031, 1100049, 110005...
## $ name_mn <fct> Alta Floresta D'oeste, Ariquemes, Cabixi, ...
## $ cod_stt <fct> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11...
## $ abbrv_s <fct> RO, RO, RO, RO, RO, RO, RO, RO, RO, RO, RO...
## $ CITY <chr> "Alta Floresta D'Oeste", "Ariquemes", "Cab...
## $ STATE <chr> "RO", "RO", "RO", "RO", "RO", "RO", "RO", ...
## $ CAPITAL <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IBGE_RES_POP <dbl> 24392, 90353, 6313, 78574, 17029, 18591, 8...
## $ IBGE_RES_POP_BRAS <dbl> 24392, 90240, 6310, 78536, 17011, 18562, 8...
## $ IBGE_RES_POP_ESTR <dbl> 0, 113, 3, 38, 18, 29, 5, 216, 9, 947, 27,...
## $ IBGE_DU <dbl> 7276, 27236, 1979, 24045, 5346, 5962, 2655...
## $ IBGE_DU_URBAN <dbl> 4306, 22971, 887, 19411, 4561, 4441, 834, ...
## $ IBGE_DU_RURAL <dbl> 2970, 4265, 1092, 4634, 785, 1521, 1821, 1...
## $ IBGE_POP <dbl> 13804, 69068, 2689, 61699, 13755, 13404, 2...
## $ IBGE_1 <dbl> 203, 1122, 43, 923, 200, 189, 47, 137, 324...
## $ `IBGE_1-4` <dbl> 869, 4383, 191, 3599, 827, 822, 159, 644, ...
## $ `IBGE_5-9` <dbl> 1159, 6101, 206, 4790, 1129, 1002, 238, 84...
## $ `IBGE_10-14` <dbl> 1281, 6870, 272, 5722, 1333, 1196, 240, 90...
## $ `IBGE_15-59` <dbl> 9023, 46000, 1742, 42027, 8976, 8801, 1685...
## $ `IBGE_60+` <dbl> 1269, 4592, 235, 4638, 1290, 1394, 190, 48...
## $ IBGE_PLANTED_AREA <dbl> 18288, 7115, 41086, 18180, 52326, 6268, 71...
## $ `IBGE_CROP_PRODUCTION_$` <dbl> 101470, 33365, 107975, 174665, 150755, 217...
## $ `IDHM Ranking 2010` <dbl> 3283, 1847, 3117, 1377, 2141, 2326, 3865, ...
## $ IDHM <dbl> 0.640, 0.700, 0.650, 0.718, 0.690, 0.685, ...
## $ IDHM_Renda <dbl> 0.657, 0.716, 0.650, 0.727, 0.688, 0.676, ...
## $ IDHM_Longevidade <dbl> 0.763, 0.806, 0.757, 0.821, 0.799, 0.814, ...
## $ IDHM_Educacao <dbl> 0.526, 0.600, 0.559, 0.620, 0.602, 0.584, ...
## $ ALT <dbl> 337.74, 138.69, 236.06, 177.45, 262.81, 41...
## $ PAY_TV <dbl> 240, 2267, 50, 1806, 307, 235, 66, 192, 28...
## $ FIXED_PHONES <dbl> 687, 9191, 188, 6491, 1215, 1107, 301, 346...
## $ AREA <dbl> 7067.03, 4426.57, 1314.35, 3792.89, 2783.3...
## $ REGIAO_TUR <chr> "Vale Do Guaporé", "Vale Do Jamari", "Vale...
## $ CATEGORIA_TUR <chr> "D", "C", "D", "C", NA, NA, NA, "D", "D", ...
## $ ESTIMATED_POP <dbl> 23167, 106168, 5438, 84813, 16444, 16227, ...
## $ RURAL_URBAN <chr> "Intermediário Adjacente", "Urbano", "Rura...
## $ GVA_AGROPEC <dbl> 166143.38, 145068.79, 59081.09, 188004.72,...
## $ GVA_INDUSTRY <dbl> 31.27, 353163.07, 4623.27, 230109.20, 2244...
## $ GVA_SERVICES <dbl> 114455.32, 879.97, 24091.22, 846490.22, 17...
## $ GVA_PUBLIC <dbl> 142727.55, 589088.78, 40041.59, 485242.29,...
## $ ` GVA_TOTAL ` <dbl> 454596.58, 1967287.27, 127837.15, 1749846....
## $ TAXES <dbl> 23186.17, 216095.93, 5.51, 194940.22, 58.1...
## $ GDP <dbl> 477782.74, 2183383.20, 133345.39, 1944786....
## $ POP_GDP <dbl> 25506, 105896, 6289, 87877, 17959, 18639, ...
## $ GDP_CAPITA <dbl> 18732.17, 20618.18, 21202.96, 22130.78, 22...
## $ GVA_MAIN <chr> "Administração, defesa, educação e saúde p...
## $ MUN_EXPENDIT <dbl> 50218466, 201979608, 17904387, 168366053, ...
## $ COMP_TOT <dbl> 508, 2221, 60, 1846, 379, 264, 94, 322, 61...
## $ COMP_A <dbl> 3, 24, 0, 13, 6, 3, 2, 1, 1, 2, 7, 13, 7, ...
## $ COMP_B <dbl> 1, 28, 0, 9, 0, 2, 0, 0, 2, 1, 4, 11, 5, 0...
## $ COMP_C <dbl> 38, 223, 7, 180, 40, 29, 9, 37, 96, 27, 87...
## $ COMP_D <dbl> 7, 1, 0, 1, 0, 3, 0, 0, 2, 0, 0, 0, 0, 0, ...
## $ COMP_E <dbl> 4, 12, 0, 7, 1, 0, 0, 0, 0, 2, 4, 13, 2, 0...
## $ COMP_F <dbl> 26, 79, 1, 80, 7, 11, 2, 10, 31, 10, 45, 1...
## $ COMP_G <dbl> 270, 1054, 32, 839, 185, 147, 54, 158, 306...
## $ COMP_H <dbl> 23, 68, 2, 76, 19, 11, 1, 10, 31, 18, 53, ...
## $ COMP_I <dbl> 20, 136, 4, 102, 20, 13, 3, 14, 15, 32, 48...
## $ COMP_J <dbl> 8, 27, 2, 34, 3, 3, 0, 2, 5, 5, 17, 54, 2,...
## $ COMP_K <dbl> 1, 32, 0, 19, 1, 3, 0, 1, 3, 2, 9, 30, 2, ...
## $ COMP_L <dbl> 3, 12, 0, 31, 3, 0, 0, 0, 5, 0, 7, 29, 7, ...
## $ COMP_M <dbl> 12, 91, 1, 83, 15, 8, 3, 4, 17, 10, 27, 13...
## $ COMP_N <dbl> 10, 116, 2, 93, 8, 9, 4, 6, 15, 12, 33, 14...
## $ COMP_O <dbl> 4, 5, 2, 4, 3, 4, 4, 4, 4, 5, 3, 4, 4, 3, ...
## $ COMP_P <dbl> 21, 68, 2, 52, 10, 5, 2, 10, 16, 12, 40, 1...
## $ COMP_Q <dbl> 13, 74, 1, 104, 12, 4, 4, 6, 20, 9, 37, 12...
## $ COMP_R <dbl> 6, 28, 0, 18, 2, 1, 0, 3, 5, 3, 11, 27, 3,...
## $ COMP_S <dbl> 38, 143, 4, 101, 44, 8, 6, 56, 38, 30, 86,...
## $ COMP_T <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COMP_U <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ HOTELS <dbl> NA, 1, 1, 1, 1, NA, NA, 2, NA, 2, NA, 2, N...
## $ BEDS <dbl> NA, 78, 27, 40, 40, NA, NA, 142, NA, 67, N...
## $ Pr_Agencies <dbl> 1, 2, NA, 2, 1, 1, NA, 0, 1, 1, 1, 5, 1, 0...
## $ Pu_Agencies <dbl> 2, 4, NA, 3, 3, 2, NA, 1, 2, 3, 3, 5, 1, 1...
## $ Pr_Bank <dbl> 1, 2, NA, 2, 1, 1, NA, 0, 1, 1, 1, 3, 1, 0...
## $ Pu_Bank <dbl> 2, 3, NA, 3, 3, 2, NA, 1, 2, 3, 3, 3, 1, 1...
## $ Pr_Assets <dbl> 38958866, 156637075, NA, 145453086, 579390...
## $ Pu_Assets <dbl> 245288714, 2487942240, NA, 2339395610, 392...
## $ Cars <dbl> 2464, 17513, 925, 17561, 2948, 3158, 831, ...
## $ Motorcycles <dbl> 9268, 42664, 1838, 38073, 6439, 6874, 2380...
## $ Wheeled_tractor <dbl> 1, 1, 0, 3, 0, 0, 0, 0, 4, 1, 2, 9, 1, 0, ...
## $ UBER <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MAC <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ `WAL-MART` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ POST_OFFICES <dbl> 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, ...
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((2864555 868.....
## code_mn name_mn cod_stt abbrv_s
## Min. :1100015 Bom Jesus : 5 31 : 853 MG : 853
## 1st Qu.:2512175 São Domingos: 5 35 : 645 SP : 645
## Median :3146354 Bonito : 4 43 : 499 RS : 499
## Mean :3253966 Planalto : 4 29 : 417 BA : 417
## 3rd Qu.:4119264 Santa Helena: 4 41 : 399 PR : 399
## Max. :5300108 Santa Inês : 4 42 : 295 SC : 295
## (Other) :5546 (Other):2464 (Other):2464
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5572 Length:5572 Min. :0.000000 Min. : 805
## Class :character Class :character 1st Qu.:0.000000 1st Qu.: 5236
## Mode :character Mode :character Median :0.000000 Median : 10936
## Mean :0.004853 Mean : 34288
## 3rd Qu.:0.000000 3rd Qu.: 23468
## Max. :1.000000 Max. :11253503
## NA's :8 NA's :9
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.00 Min. : 239 Min. : 60
## 1st Qu.: 5230 1st Qu.: 0.00 1st Qu.: 1573 1st Qu.: 874
## Median : 10934 Median : 0.00 Median : 3176 Median : 1850
## Mean : 34210 Mean : 77.53 Mean : 10306 Mean : 8862
## 3rd Qu.: 23394 3rd Qu.: 10.00 3rd Qu.: 6727 3rd Qu.: 4627
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :9 NA's :9 NA's :11 NA's :11
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3 Min. : 174 Min. : 0.0 Min. : 5.0
## 1st Qu.: 487 1st Qu.: 2802 1st Qu.: 38.0 1st Qu.: 158.0
## Median : 931 Median : 6177 Median : 92.0 Median : 377.0
## Mean : 1463 Mean : 27604 Mean : 383.4 Mean : 1545.1
## 3rd Qu.: 1832 3rd Qu.: 15304 3rd Qu.: 232.0 3rd Qu.: 951.5
## Max. :33809 Max. :10463636 Max. :129464.0 Max. :514794.0
## NA's :82 NA's :9 NA's :9 NA's :9
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7.0 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220.5 1st Qu.: 260 1st Qu.: 1735 1st Qu.: 341
## Median : 516.0 Median : 589 Median : 3842 Median : 723
## Mean : 2070.0 Mean : 2382 Mean : 18218 Mean : 3005
## 3rd Qu.: 1300.5 3rd Qu.: 1478 3rd Qu.: 9630 3rd Qu.: 1724
## Max. :684443.0 Max. :783702 Max. :7058221 Max. :1293012
## NA's :9 NA's :9 NA's :9 NA's :9
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 911.5 1st Qu.: 2330 1st Qu.:1392 1st Qu.:0.5990
## Median : 3473.0 Median : 13845 Median :2782 Median :0.6650
## Mean : 14172.6 Mean : 57362 Mean :2783 Mean :0.6592
## 3rd Qu.: 11172.5 3rd Qu.: 55579 3rd Qu.:4174 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :9 NA's :9 NA's :8 NA's :8
## IDHM_Renda IDHM_Longevidade IDHM_Educacao ALT
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. : 0.0
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.: 169.8
## Median :0.6540 Median :0.8080 Median :0.5600 Median : 406.5
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean : 893.8
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.: 628.9
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :874579.0
## NA's :8 NA's :8 NA's :8 NA's :8
## PAY_TV FIXED_PHONES AREA REGIAO_TUR
## Min. : 1 Min. : 3 Min. : 3.57 Length:5572
## 1st Qu.: 88 1st Qu.: 119 1st Qu.: 204.49 Class :character
## Median : 247 Median : 328 Median : 415.87 Mode :character
## Mean : 3097 Mean : 6574 Mean : 1515.73
## 3rd Qu.: 816 3rd Qu.: 1151 3rd Qu.: 1026.16
## Max. :2047668 Max. :5543127 Max. :159533.33
## NA's :8 NA's :8 NA's :10
## CATEGORIA_TUR ESTIMATED_POP RURAL_URBAN GVA_AGROPEC
## Length:5572 Min. : 786 Length:5572 Min. : 0
## Class :character 1st Qu.: 5454 Class :character 1st Qu.: 4190
## Mode :character Median : 11591 Mode :character Median : 20430
## Mean : 37463 Mean : 47268
## 3rd Qu.: 25306 3rd Qu.: 51216
## Max. :12176866 Max. :1402282
## NA's :8 NA's :9
## GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL
## Min. : 1 Min. : 2 Min. : 7 Min. : 17
## 1st Qu.: 1725 1st Qu.: 10117 1st Qu.: 17256 1st Qu.: 42345
## Median : 7425 Median : 31216 Median : 35865 Median : 119504
## Mean : 176063 Mean : 490028 Mean : 123879 Mean : 833879
## 3rd Qu.: 40995 3rd Qu.: 115583 3rd Qu.: 89339 3rd Qu.: 314089
## Max. :63306755 Max. :464656988 Max. :41902893 Max. :569910503
## NA's :9 NA's :9 NA's :9 NA's :9
## TAXES GDP POP_GDP GDP_CAPITA
## Min. : -14159 Min. : 15 Min. : 815 Min. : 3191
## 1st Qu.: 1302 1st Qu.: 43676 1st Qu.: 5488 1st Qu.: 9062
## Median : 5107 Median : 125111 Median : 11584 Median : 15866
## Mean : 119000 Mean : 955528 Mean : 37035 Mean : 21093
## 3rd Qu.: 22208 3rd Qu.: 329361 3rd Qu.: 25108 3rd Qu.: 26155
## Max. :117125387 Max. :687035890 Max. :12038175 Max. :314638
## NA's :9 NA's :9 NA's :9 NA's :9
## GVA_MAIN MUN_EXPENDIT COMP_TOT COMP_A
## Length:5572 Min. :1.421e+06 Min. : 6.0 Min. : 0.00
## Class :character 1st Qu.:1.574e+07 1st Qu.: 68.0 1st Qu.: 1.00
## Mode :character Median :2.746e+07 Median : 162.0 Median : 2.00
## Mean :1.044e+08 Mean : 907.8 Mean : 18.27
## 3rd Qu.:5.679e+07 3rd Qu.: 449.5 3rd Qu.: 8.00
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00
## NA's :1497 NA's :9 NA's :9
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.00 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.000 Median : 11.00 Median : 0.0000 Median : 0.000
## Mean : 1.853 Mean : 73.53 Mean : 0.4264 Mean : 2.031
## 3rd Qu.: 2.000 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :274.000 Max. :31566.00 Max. :332.0000 Max. :657.000
## NA's :9 NA's :9 NA's :9 NA's :9
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.0 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.0 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.0 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.3 Mean : 348.4 Mean : 41.04 Mean : 55.94
## 3rd Qu.: 15.0 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.0 Max. :150633.0 Max. :19515.00 Max. :29290.00
## NA's :9 NA's :9 NA's :9 NA's :9
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.77 Mean : 15.57 Mean : 15.16 Mean : 51.35
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
## NA's :9 NA's :9 NA's :9 NA's :9
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.0 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.0 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.0 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.8 Mean : 3.271 Mean : 30.99 Mean : 34.19
## 3rd Qu.: 14.0 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.0 Max. :204.000 Max. :16030.00 Max. :22248.00
## NA's :9 NA's :9 NA's :9 NA's :9
## COMP_R COMP_S COMP_T COMP_U
## Min. : 0.00 Min. : 0.00 Min. :0 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median :0 Median : 0.00000
## Mean : 12.19 Mean : 51.66 Mean :0 Mean : 0.05033
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :0 Max. :123.00000
## NA's :9 NA's :9 NA's :9 NA's :9
## HOTELS BEDS Pr_Agencies Pu_Agencies
## Min. : 1.000 Min. : 2.0 Min. : 0.000 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 1.000 Median : 82.0 Median : 1.000 Median : 2.00
## Mean : 3.131 Mean : 257.5 Mean : 3.384 Mean : 2.83
## 3rd Qu.: 3.000 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.00
## Max. :97.000 Max. :13247.0 Max. :1693.000 Max. :626.00
## NA's :4685 NA's :4685 NA's :2231 NA's :2231
## Pr_Bank Pu_Bank Pr_Assets Pu_Assets
## Min. : 0.000 Min. :0.00 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.: 0.000 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.048e+07
## Median : 1.000 Median :2.00 Median :3.234e+07 Median :1.339e+08
## Mean : 1.312 Mean :1.58 Mean :9.183e+09 Mean :6.007e+09
## 3rd Qu.: 2.000 3rd Qu.:2.00 3rd Qu.:1.149e+08 3rd Qu.:4.976e+08
## Max. :83.000 Max. :8.00 Max. :1.947e+13 Max. :8.016e+12
## NA's :2231 NA's :2231 NA's :2231 NA's :2231
## Cars Motorcycles Wheeled_tractor UBER
## Min. : 2 Min. : 4 Min. : 0.000 Min. :1
## 1st Qu.: 602 1st Qu.: 591 1st Qu.: 0.000 1st Qu.:1
## Median : 1440 Median : 1286 Median : 0.000 Median :1
## Mean : 9869 Mean : 4883 Mean : 5.759 Mean :1
## 3rd Qu.: 4091 3rd Qu.: 3299 3rd Qu.: 1.000 3rd Qu.:1
## Max. :5740995 Max. :1134570 Max. :3236.000 Max. :1
## NA's :16 NA's :16 NA's :16 NA's :5447
## MAC WAL-MART POST_OFFICES geometry
## Min. : 1.000 Min. : 1.000 Min. : 1.000 MULTIPOLYGON :5572
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 epsg:5641 : 0
## Median : 2.000 Median : 1.000 Median : 1.000 +proj=merc...: 0
## Mean : 4.277 Mean : 2.059 Mean : 2.081
## 3rd Qu.: 3.000 3rd Qu.: 1.750 3rd Qu.: 2.000
## Max. :130.000 Max. :26.000 Max. :225.000
## NA's :5406 NA's :5470 NA's :126
The above summary contains 84 columns but we do not require all of them for our analysis let us remove some variables which are not important in our analysis.
To make our feature extraction more efficient, I will be streamlining the process by eliminating columns which contain excessive NA values or they are not relevant as a predictor variable for GDP per Capita.
brazil <- brazil_muni %>%
drop_na("GDP_CAPITA") %>%
mutate(`WAL-MART` = NULL,
MAC = NULL,
UBER = NULL,
Pr_Bank = NULL,
Pu_Bank = NULL,
Pr_Assets = NULL,
Pu_Assets = NULL,
Pr_Agencies = NULL,
Pu_Agencies = NULL,
HOTELS = NULL,
BEDS = NULL,
cod_stt = NULL,
abbrv_s = NULL,
CITY = NULL,
STATE = NULL,
CAPITAL = NULL,
REGIAO_TUR = NULL,
CATEGORIA_TUR = NULL,
MUN_EXPENDIT = NULL,
POST_OFFICES = NULL,
code_mn = NULL,
name_mn = NULL,
IBGE_1 = NULL,
`IBGE_1-4` = NULL,
`IBGE_5-9` = NULL,
`IBGE_10-14` = NULL,
`IBGE_60+` = NULL,
GVA_MAIN = NULL,
`IDHM Ranking 2010` = NULL,
" GVA_TOTAL " = NULL)
summary(brazil)## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU
## Min. : 805 Min. : 805 Min. : 0.00 Min. : 239
## 1st Qu.: 5236 1st Qu.: 5230 1st Qu.: 0.00 1st Qu.: 1573
## Median : 10936 Median : 10934 Median : 0.00 Median : 3176
## Mean : 34288 Mean : 34210 Mean : 77.53 Mean : 10306
## 3rd Qu.: 23468 3rd Qu.: 23394 3rd Qu.: 10.00 3rd Qu.: 6727
## Max. :11253503 Max. :11133776 Max. :119727.00 Max. :3576148
## NA's :2
## IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_15-59
## Min. : 60 Min. : 3 Min. : 174 Min. : 94
## 1st Qu.: 874 1st Qu.: 487 1st Qu.: 2802 1st Qu.: 1735
## Median : 1850 Median : 931 Median : 6177 Median : 3842
## Mean : 8862 Mean : 1463 Mean : 27604 Mean : 18218
## 3rd Qu.: 4627 3rd Qu.: 1832 3rd Qu.: 15304 3rd Qu.: 9630
## Max. :3548433 Max. :33809 Max. :10463636 Max. :7058221
## NA's :2 NA's :73
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM IDHM_Renda
## Min. : 0.0 Min. : 0 Min. :0.4180 Min. :0.4000
## 1st Qu.: 911.5 1st Qu.: 2330 1st Qu.:0.5990 1st Qu.:0.5720
## Median : 3473.0 Median : 13845 Median :0.6650 Median :0.6540
## Mean : 14172.6 Mean : 57362 Mean :0.6592 Mean :0.6429
## 3rd Qu.: 11172.5 3rd Qu.: 55579 3rd Qu.:0.7180 3rd Qu.:0.7070
## Max. :1205669.0 Max. :3274885 Max. :0.8620 Max. :0.8910
##
## IDHM_Longevidade IDHM_Educacao ALT PAY_TV
## Min. :0.6720 Min. :0.2070 Min. : 0.0 Min. : 1
## 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.: 169.7 1st Qu.: 88
## Median :0.8080 Median :0.5600 Median : 406.5 Median : 247
## Mean :0.8016 Mean :0.5591 Mean : 894.0 Mean : 3098
## 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.: 629.0 3rd Qu.: 816
## Max. :0.8940 Max. :0.8250 Max. :874579.0 Max. :2047668
##
## FIXED_PHONES AREA ESTIMATED_POP RURAL_URBAN
## Min. : 3 Min. : 3.57 Min. : 786 Length:5563
## 1st Qu.: 119 1st Qu.: 204.49 1st Qu.: 5454 Class :character
## Median : 328 Median : 415.87 Median : 11591 Mode :character
## Mean : 6575 Mean : 1515.73 Mean : 37468
## 3rd Qu.: 1151 3rd Qu.: 1026.16 3rd Qu.: 25308
## Max. :5543127 Max. :159533.33 Max. :12176866
## NA's :1
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4190 1st Qu.: 1725 1st Qu.: 10117 1st Qu.: 17256
## Median : 20430 Median : 7425 Median : 31216 Median : 35865
## Mean : 47268 Mean : 176063 Mean : 490028 Mean : 123879
## 3rd Qu.: 51216 3rd Qu.: 40995 3rd Qu.: 115583 3rd Qu.: 89339
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## TAXES GDP POP_GDP GDP_CAPITA
## Min. : -14159 Min. : 15 Min. : 815 Min. : 3191
## 1st Qu.: 1302 1st Qu.: 43676 1st Qu.: 5488 1st Qu.: 9062
## Median : 5107 Median : 125111 Median : 11584 Median : 15866
## Mean : 119000 Mean : 955528 Mean : 37035 Mean : 21093
## 3rd Qu.: 22208 3rd Qu.: 329361 3rd Qu.: 25108 3rd Qu.: 26155
## Max. :117125387 Max. :687035890 Max. :12038175 Max. :314638
##
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 162.0 Median : 2.00 Median : 0.000 Median : 11.00
## Mean : 907.8 Mean : 18.27 Mean : 1.853 Mean : 73.53
## 3rd Qu.: 449.5 3rd Qu.: 8.00 3rd Qu.: 2.000 3rd Qu.: 39.00
## Max. :530446.0 Max. :1948.00 Max. :274.000 Max. :31566.00
##
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.0 Min. : 1.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.0 1st Qu.: 32.0
## Median : 0.0000 Median : 0.000 Median : 4.0 Median : 75.0
## Mean : 0.4264 Mean : 2.031 Mean : 43.3 Mean : 348.4
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.0 3rd Qu.: 200.0
## Max. :332.0000 Max. :657.000 Max. :25222.0 Max. :150633.0
##
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7.00 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41.04 Mean : 55.94 Mean : 24.77 Mean : 15.57
## 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515.00 Max. :29290.00 Max. :38720.00 Max. :23738.00
##
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 1.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 0.00 Median : 4.00 Median : 4.0 Median : 2.000
## Mean : 15.16 Mean : 51.35 Mean : 83.8 Mean : 3.271
## 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.0 3rd Qu.: 3.000
## Max. :14003.00 Max. :49181.00 Max. :76757.0 Max. :204.000
##
## COMP_P COMP_Q COMP_R COMP_S
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.: 5.00
## Median : 6.00 Median : 3.00 Median : 2.00 Median : 12.00
## Mean : 30.99 Mean : 34.19 Mean : 12.19 Mean : 51.66
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00 3rd Qu.: 31.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00 Max. :24832.00
##
## COMP_T COMP_U Cars Motorcycles
## Min. :0 Min. : 0.00000 Min. : 2 Min. : 4
## 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 602 1st Qu.: 591
## Median :0 Median : 0.00000 Median : 1440 Median : 1286
## Mean :0 Mean : 0.05033 Mean : 9870 Mean : 4884
## 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 4092 3rd Qu.: 3299
## Max. :0 Max. :123.00000 Max. :5740995 Max. :1134570
## NA's :8 NA's :8
## Wheeled_tractor geometry
## Min. : 0.00 MULTIPOLYGON :5563
## 1st Qu.: 0.00 epsg:5641 : 0
## Median : 0.00 +proj=merc...: 0
## Mean : 5.76
## 3rd Qu.: 1.00
## Max. :3236.00
## NA's :8
I will be creating 15 new variables to be used for the analysis which I believe are useful in determining their impact on GDP per Capita.
brazil1 <- brazil %>%
mutate(ECONOMY_ACTIVE_RATIO = (`IBGE_15-59`/IBGE_RES_POP),
HDI_EDUCATION = IDHM_Educacao,
HDI_LONGEVITY = IDHM_Longevidade,
TAX_GDP_RATIO = (TAXES/GDP),
CROP_PRODUCTION = `IBGE_CROP_PRODUCTION_$` * 1000,
GVA_AGROPEC_CAPITA = (GVA_AGROPEC/ESTIMATED_POP) * 1000 ,
GVA_INDUSTRY_CAPITA = (GVA_INDUSTRY/ESTIMATED_POP) * 1000,
GVA_PUBLIC_CAPITA = (GVA_PUBLIC/ESTIMATED_POP) * 1000,
GVA_SERVICES_CAPITA = (GVA_SERVICES/ESTIMATED_POP) * 1000,
FIXED_PHONES_PR = (FIXED_PHONES/ESTIMATED_POP),
TV_PR = (PAY_TV/ESTIMATED_POP),
CAR_PR = (Cars/ESTIMATED_POP),
MOTORCYCLES_PR = (Motorcycles/ESTIMATED_POP),
TRACTOR_PR = (Wheeled_tractor/ESTIMATED_POP),
POPULATION_DENSITY = ESTIMATED_POP/AREA) %>%
drop_na("TV_PR") %>%
drop_na("CAR_PR") %>%
drop_na("POPULATION_DENSITY")## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU
## Min. : 805 Min. : 805 Min. : 0.00 Min. : 239
## 1st Qu.: 5238 1st Qu.: 5233 1st Qu.: 0.00 1st Qu.: 1575
## Median : 10943 Median : 10942 Median : 0.00 Median : 3180
## Mean : 34323 Mean : 34245 Mean : 77.65 Mean : 10317
## 3rd Qu.: 23593 3rd Qu.: 23469 3rd Qu.: 10.00 3rd Qu.: 6728
## Max. :11253503 Max. :11133776 Max. :119727.00 Max. :3576148
## NA's :2
## IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_15-59
## Min. : 60 Min. : 3 Min. : 174 Min. : 94
## 1st Qu.: 876 1st Qu.: 487 1st Qu.: 2803 1st Qu.: 1735
## Median : 1852 Median : 931 Median : 6183 Median : 3848
## Mean : 8872 Mean : 1463 Mean : 27634 Mean : 18238
## 3rd Qu.: 4627 3rd Qu.: 1834 3rd Qu.: 15305 3rd Qu.: 9632
## Max. :3548433 Max. :33809 Max. :10463636 Max. :7058221
## NA's :2 NA's :73
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM IDHM_Renda
## Min. : 0.0 Min. : 0 Min. :0.4180 Min. :0.4000
## 1st Qu.: 913.2 1st Qu.: 2328 1st Qu.:0.5990 1st Qu.:0.5720
## Median : 3477.5 Median : 13859 Median :0.6650 Median :0.6540
## Mean : 14166.0 Mean : 57333 Mean :0.6593 Mean :0.6429
## 3rd Qu.: 11173.2 3rd Qu.: 55619 3rd Qu.:0.7200 3rd Qu.:0.7070
## Max. :1205669.0 Max. :3274885 Max. :0.8620 Max. :0.8910
##
## IDHM_Longevidade IDHM_Educacao ALT PAY_TV
## Min. :0.6720 Min. :0.2070 Min. : 0.0 Min. : 1.0
## 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.: 170.3 1st Qu.: 88.0
## Median :0.8080 Median :0.5600 Median : 406.6 Median : 248.0
## Mean :0.8016 Mean :0.5592 Mean : 894.9 Mean : 3101.9
## 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.: 629.2 3rd Qu.: 816.8
## Max. :0.8940 Max. :0.8250 Max. :874579.0 Max. :2047668.0
##
## FIXED_PHONES AREA ESTIMATED_POP RURAL_URBAN
## Min. : 3 Min. : 3.57 Min. : 786 Length:5554
## 1st Qu.: 119 1st Qu.: 204.68 1st Qu.: 5456 Class :character
## Median : 328 Median : 415.87 Median : 11602 Mode :character
## Mean : 6585 Mean : 1516.70 Mean : 37507
## 3rd Qu.: 1155 3rd Qu.: 1025.40 3rd Qu.: 25323
## Max. :5543127 Max. :159533.33 Max. :12176866
##
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4193 1st Qu.: 1727 1st Qu.: 10115 1st Qu.: 17254
## Median : 20436 Median : 7442 Median : 31242 Median : 35866
## Mean : 47269 Mean : 176316 Mean : 490695 Mean : 123988
## 3rd Qu.: 51241 3rd Qu.: 41003 3rd Qu.: 115614 3rd Qu.: 89351
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## TAXES GDP POP_GDP GDP_CAPITA
## Min. : -14159 Min. : 15 Min. : 815 Min. : 3191
## 1st Qu.: 1301 1st Qu.: 43706 1st Qu.: 5494 1st Qu.: 9063
## Median : 5108 Median : 125473 Median : 11588 Median : 15877
## Mean : 119169 Mean : 956868 Mean : 37072 Mean : 21104
## 3rd Qu.: 22208 3rd Qu.: 329764 3rd Qu.: 25116 3rd Qu.: 26156
## Max. :117125387 Max. :687035890 Max. :12038175 Max. :314638
##
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.0 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.0 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 163.0 Median : 2.0 Median : 0.000 Median : 11.00
## Mean : 908.9 Mean : 18.3 Mean : 1.856 Mean : 73.62
## 3rd Qu.: 449.8 3rd Qu.: 8.0 3rd Qu.: 2.000 3rd Qu.: 39.75
## Max. :530446.0 Max. :1948.0 Max. :274.000 Max. :31566.00
##
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 1.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 32.0
## Median : 0.0000 Median : 0.000 Median : 4.00 Median : 75.0
## Mean : 0.4271 Mean : 2.032 Mean : 43.36 Mean : 348.8
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00 3rd Qu.: 200.0
## Max. :332.0000 Max. :657.000 Max. :25222.00 Max. :150633.0
##
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7.00 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41.09 Mean : 56.01 Mean : 24.81 Mean : 15.59
## 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515.00 Max. :29290.00 Max. :38720.00 Max. :23738.00
##
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 1.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 2.000
## Median : 0.00 Median : 4.00 Median : 4.00 Median : 2.000
## Mean : 15.18 Mean : 51.43 Mean : 83.92 Mean : 3.272
## 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00 3rd Qu.: 3.000
## Max. :14003.00 Max. :49181.00 Max. :76757.00 Max. :204.000
##
## COMP_P COMP_Q COMP_R COMP_S
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.: 5.00
## Median : 6.00 Median : 3.00 Median : 2.00 Median : 12.50
## Mean : 31.02 Mean : 34.24 Mean : 12.21 Mean : 51.73
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00 3rd Qu.: 31.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00 Max. :24832.00
##
## COMP_T COMP_U Cars Motorcycles
## Min. :0 Min. : 0.00000 Min. : 2 Min. : 4.0
## 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 602 1st Qu.: 591.2
## Median :0 Median : 0.00000 Median : 1440 Median : 1287.0
## Mean :0 Mean : 0.05041 Mean : 9872 Mean : 4884.9
## 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 4094 3rd Qu.: 3299.0
## Max. :0 Max. :123.00000 Max. :5740995 Max. :1134570.0
##
## Wheeled_tractor geometry ECONOMY_ACTIVE_RATIO HDI_EDUCATION
## Min. : 0.000 MULTIPOLYGON :5554 Min. :0.02557 Min. :0.2070
## 1st Qu.: 0.000 epsg:5641 : 0 1st Qu.:0.28284 1st Qu.:0.4900
## Median : 0.000 +proj=merc...: 0 Median :0.39614 Median :0.5600
## Mean : 5.761 Mean :0.39763 Mean :0.5592
## 3rd Qu.: 1.000 3rd Qu.:0.51729 3rd Qu.:0.6310
## Max. :3236.000 Max. :0.70989 Max. :0.8250
##
## HDI_LONGEVITY TAX_GDP_RATIO CROP_PRODUCTION GVA_AGROPEC_CAPITA
## Min. :0.6720 Min. : -2.0920 Min. :0.000e+00 Min. : 0.0
## 1st Qu.:0.7690 1st Qu.: 0.0297 1st Qu.:2.328e+06 1st Qu.: 350.5
## Median :0.8080 Median : 0.0526 Median :1.386e+07 Median : 1421.5
## Mean :0.8016 Mean : 8.5600 Mean :5.733e+07 Mean : 3775.1
## 3rd Qu.:0.8360 3rd Qu.: 0.0973 3rd Qu.:5.562e+07 3rd Qu.: 4715.5
## Max. :0.8940 Max. :324.6684 Max. :3.275e+09 Max. :120903.8
##
## GVA_INDUSTRY_CAPITA GVA_PUBLIC_CAPITA GVA_SERVICES_CAPITA
## Min. : 0.09 Min. : 2.397 Min. : 0.46
## 1st Qu.: 260.20 1st Qu.: 3270.481 1st Qu.: 1616.30
## Median : 799.03 Median : 3982.950 Median : 3578.82
## Mean : 3251.38 Mean : 3764.136 Mean : 5583.35
## 3rd Qu.: 2597.16 3rd Qu.: 4808.575 3rd Qu.: 7776.61
## Max. :253119.93 Max. :14897.704 Max. :104596.73
##
## FIXED_PHONES_PR TV_PR CAR_PR MOTORCYCLES_PR
## Min. :0.0004665 Min. :0.000195 Min. :0.0000515 Min. :0.0001029
## 1st Qu.:0.0135470 1st Qu.:0.011511 1st Qu.:0.0630405 1st Qu.:0.0876130
## Median :0.0385933 Median :0.024744 Median :0.1741079 Median :0.1234271
## Mean :0.0602580 Mean :0.036860 Mean :0.1922417 Mean :0.1343910
## 3rd Qu.:0.0852145 3rd Qu.:0.048303 3rd Qu.:0.3140151 3rd Qu.:0.1695969
## Max. :1.0230769 Max. :0.496119 Max. :0.6557617 Max. :0.5452368
##
## TRACTOR_PR POPULATION_DENSITY
## Min. :0.000e+00 Min. : 0.167
## 1st Qu.:0.000e+00 1st Qu.: 11.742
## Median :0.000e+00 Median : 25.031
## Mean :1.324e-04 Mean : 119.148
## 3rd Qu.:7.763e-05 3rd Qu.: 56.113
## Max. :1.026e-02 Max. :14005.395
##
## GDP_CAPITA ECONOMY_ACTIVE_RATIO HDI_EDUCATION HDI_LONGEVITY
## Min. : 3191 Min. :0.02557 Min. :0.2070 Min. :0.6720
## 1st Qu.: 9063 1st Qu.:0.28284 1st Qu.:0.4900 1st Qu.:0.7690
## Median : 15877 Median :0.39614 Median :0.5600 Median :0.8080
## Mean : 21104 Mean :0.39763 Mean :0.5592 Mean :0.8016
## 3rd Qu.: 26156 3rd Qu.:0.51729 3rd Qu.:0.6310 3rd Qu.:0.8360
## Max. :314638 Max. :0.70989 Max. :0.8250 Max. :0.8940
## TAX_GDP_RATIO CROP_PRODUCTION GVA_AGROPEC_CAPITA GVA_INDUSTRY_CAPITA
## Min. : -2.0920 Min. :0.000e+00 Min. : 0.0 Min. : 0.09
## 1st Qu.: 0.0297 1st Qu.:2.328e+06 1st Qu.: 350.5 1st Qu.: 260.20
## Median : 0.0526 Median :1.386e+07 Median : 1421.5 Median : 799.03
## Mean : 8.5600 Mean :5.733e+07 Mean : 3775.1 Mean : 3251.38
## 3rd Qu.: 0.0973 3rd Qu.:5.562e+07 3rd Qu.: 4715.5 3rd Qu.: 2597.16
## Max. :324.6684 Max. :3.275e+09 Max. :120903.8 Max. :253119.93
## GVA_PUBLIC_CAPITA GVA_SERVICES_CAPITA FIXED_PHONES_PR TV_PR
## Min. : 2.397 Min. : 0.46 Min. :0.0004665 Min. :0.000195
## 1st Qu.: 3270.481 1st Qu.: 1616.30 1st Qu.:0.0135470 1st Qu.:0.011511
## Median : 3982.950 Median : 3578.82 Median :0.0385933 Median :0.024744
## Mean : 3764.136 Mean : 5583.35 Mean :0.0602580 Mean :0.036860
## 3rd Qu.: 4808.575 3rd Qu.: 7776.61 3rd Qu.:0.0852145 3rd Qu.:0.048303
## Max. :14897.704 Max. :104596.73 Max. :1.0230769 Max. :0.496119
## CAR_PR MOTORCYCLES_PR TRACTOR_PR
## Min. :0.0000515 Min. :0.0001029 Min. :0.000e+00
## 1st Qu.:0.0630405 1st Qu.:0.0876130 1st Qu.:0.000e+00
## Median :0.1741079 Median :0.1234271 Median :0.000e+00
## Mean :0.1922417 Mean :0.1343910 Mean :1.324e-04
## 3rd Qu.:0.3140151 3rd Qu.:0.1695969 3rd Qu.:7.763e-05
## Max. :0.6557617 Max. :0.5452368 Max. :1.026e-02
## POPULATION_DENSITY geometry
## Min. : 0.167 MULTIPOLYGON :5554
## 1st Qu.: 11.742 epsg:5641 : 0
## Median : 25.031 +proj=merc...: 0
## Mean : 119.148
## 3rd Qu.: 56.113
## Max. :14005.395
ggplot(data = brazil2, aes(x = GDP_CAPITA)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")The distribution of the GDP per Capita seems to be skewed. In the event the linear regression model does not support the linearity assumption. We might consider using log(GDP per Capita).
brazil2 <- brazil2 %>%
mutate(GDP_CAPITA_LOG = log(GDP_CAPITA))
ggplot(data = brazil2, aes(x = GDP_CAPITA_LOG)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")As you can see from above, a log transformation gives the GDP per Capita values a distribution resembling normal distribution. However, we will only consider this in case our linearity assumption for the regression model does not hold true.
a <- ggplot(data = brazil2, aes(x = HDI_EDUCATION)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
b <- ggplot(data = brazil2, aes(x = HDI_LONGEVITY)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
c <- ggplot(data = brazil2, aes(x = ECONOMY_ACTIVE_RATIO)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
d <- ggplot(data = brazil2, aes(x = POPULATION_DENSITY)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
e <- ggplot(data = brazil2, aes(x = TAX_GDP_RATIO)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
f <- ggplot(data = brazil2, aes(x = CROP_PRODUCTION)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
g <- ggplot(data = brazil2, aes(x = GVA_AGROPEC_CAPITA)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
h <- ggplot(data = brazil2, aes(x = GVA_SERVICES_CAPITA)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
i <- ggplot(data = brazil2, aes(x = GVA_INDUSTRY_CAPITA)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
j <- ggplot(data = brazil2, aes(x = GVA_PUBLIC_CAPITA)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
k <- ggplot(data = brazil2, aes(x = FIXED_PHONES_PR)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
l <- ggplot(data = brazil2, aes(x = TV_PR)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
m <- ggplot(data = brazil2, aes(x = TRACTOR_PR)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
n <- ggplot(data = brazil2, aes(x = MOTORCYCLES_PR)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
o <- ggplot(data = brazil2, aes(x = CAR_PR)) +
geom_histogram(bins = 20, color = "black", fill = "light blue")
ggarrange(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, ncol = 3, nrow = 5)The independent variables seem to have a mix of skewed and normal distributions.
We will be using a cut off of 0.80 to eliminate variables from the model.
corrplot(cor(brazil3[, 2:16]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, number.cex = 0.55, method = "number", type = "upper")Since none of the variables have a high correlation of 0.80 or above we can continue to create our Multiplie Linear Regression model based on all the variables in the corrplot.
The confidence interval is 95%. Therefore, the alpha-value would be 0.05. Anything with a p-value above 0.05 will be considered statistically insignificant.
brazil.mlr <- lm(formula = GDP_CAPITA ~ HDI_EDUCATION + HDI_LONGEVITY + TAX_GDP_RATIO + TV_PR + FIXED_PHONES_PR + MOTORCYCLES_PR + CAR_PR + TRACTOR_PR + GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA + ECONOMY_ACTIVE_RATIO + POPULATION_DENSITY + CROP_PRODUCTION, data = brazil3)
summary(brazil.mlr)##
## Call:
## lm(formula = GDP_CAPITA ~ HDI_EDUCATION + HDI_LONGEVITY + TAX_GDP_RATIO +
## TV_PR + FIXED_PHONES_PR + MOTORCYCLES_PR + CAR_PR + TRACTOR_PR +
## GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA +
## GVA_AGROPEC_CAPITA + ECONOMY_ACTIVE_RATIO + POPULATION_DENSITY +
## CROP_PRODUCTION, data = brazil3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31330 -2993 -1121 1023 260326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.290e+03 3.146e+03 -2.953 0.003162 **
## HDI_EDUCATION 8.287e+03 2.284e+03 3.628 0.000288 ***
## HDI_LONGEVITY 1.200e+04 4.300e+03 2.791 0.005272 **
## TAX_GDP_RATIO 1.283e+01 4.449e+00 2.883 0.003952 **
## TV_PR 7.113e+03 3.789e+03 1.878 0.060485 .
## FIXED_PHONES_PR 9.726e+03 3.159e+03 3.079 0.002090 **
## MOTORCYCLES_PR -3.000e+03 1.779e+03 -1.686 0.091758 .
## CAR_PR 5.357e+03 1.618e+03 3.311 0.000935 ***
## TRACTOR_PR -1.486e+04 2.576e+05 -0.058 0.954002
## GVA_SERVICES_CAPITA 9.995e-01 2.261e-02 44.195 < 2e-16 ***
## GVA_INDUSTRY_CAPITA 1.142e+00 1.284e-02 88.908 < 2e-16 ***
## GVA_PUBLIC_CAPITA 4.735e-01 5.994e-02 7.901 3.32e-15 ***
## GVA_AGROPEC_CAPITA 8.844e-01 2.112e-02 41.884 < 2e-16 ***
## ECONOMY_ACTIVE_RATIO -7.308e+02 1.098e+03 -0.665 0.505837
## POPULATION_DENSITY 3.277e-01 2.002e-01 1.637 0.101721
## CROP_PRODUCTION 6.882e-06 8.628e-07 7.976 1.82e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8479 on 5538 degrees of freedom
## Multiple R-squared: 0.8246, Adjusted R-squared: 0.8241
## F-statistic: 1736 on 15 and 5538 DF, p-value: < 2.2e-16
We will be eliminating variables with p-values greater than 0.05. Therefore, HDI_LONGEVITY, TV_PR, TRACTOR_PR and POPULATION_DENSITY will be eliminated.
brazil.mlr1 <- lm(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO + FIXED_PHONES_PR + MOTORCYCLES_PR + CAR_PR + GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA + ECONOMY_ACTIVE_RATIO + CROP_PRODUCTION, data = brazil3)
summary(brazil.mlr1)##
## Call:
## lm(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO + FIXED_PHONES_PR +
## MOTORCYCLES_PR + CAR_PR + GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA +
## GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA + ECONOMY_ACTIVE_RATIO +
## CROP_PRODUCTION, data = brazil3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31718 -2871 -1190 979 260381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.430e+02 9.696e+02 -0.973 0.33079
## HDI_EDUCATION 9.863e+03 2.237e+03 4.409 1.06e-05 ***
## TAX_GDP_RATIO 1.303e+01 4.450e+00 2.927 0.00343 **
## FIXED_PHONES_PR 1.407e+04 2.830e+03 4.970 6.89e-07 ***
## MOTORCYCLES_PR -3.375e+03 1.767e+03 -1.910 0.05624 .
## CAR_PR 6.893e+03 1.418e+03 4.862 1.20e-06 ***
## GVA_SERVICES_CAPITA 1.004e+00 2.247e-02 44.668 < 2e-16 ***
## GVA_INDUSTRY_CAPITA 1.141e+00 1.283e-02 88.991 < 2e-16 ***
## GVA_PUBLIC_CAPITA 4.805e-01 5.992e-02 8.019 1.29e-15 ***
## GVA_AGROPEC_CAPITA 8.902e-01 2.092e-02 42.545 < 2e-16 ***
## ECONOMY_ACTIVE_RATIO -4.284e+02 1.087e+03 -0.394 0.69362
## CROP_PRODUCTION 6.745e-06 8.573e-07 7.868 4.29e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8487 on 5542 degrees of freedom
## Multiple R-squared: 0.8241, Adjusted R-squared: 0.8238
## F-statistic: 2361 on 11 and 5542 DF, p-value: < 2.2e-16
Since the difference in adjusted R2 value is only 0.0003, the removal of variables shows that they did not contribute to the variance of the model. Since the p-value for F-test is lower than alpha-value of 0.05, we reject the null hypothesis meaning that the overall model is significant. With an adjusted R2 value of 0.8238, we can conclude that the model is a good fit since the variables account for 82.38% of the variance.
## Variables Tolerance VIF
## 1 HDI_EDUCATION 0.2975816 3.360423
## 2 TAX_GDP_RATIO 0.9758489 1.024749
## 3 FIXED_PHONES_PR 0.3649607 2.740021
## 4 MOTORCYCLES_PR 0.9234899 1.082849
## 5 CAR_PR 0.3401589 2.939802
## 6 GVA_SERVICES_CAPITA 0.5629594 1.776327
## 7 GVA_INDUSTRY_CAPITA 0.8476002 1.179802
## 8 GVA_PUBLIC_CAPITA 0.9426659 1.060821
## 9 GVA_AGROPEC_CAPITA 0.7215136 1.385975
## 10 ECONOMY_ACTIVE_RATIO 0.5164827 1.936173
## 11 CROP_PRODUCTION 0.8039035 1.243930
The values for VIF are well below 5 and 10. This means that no mulitcollinearity assumption is satisfied.
The Residual vs Fitted plot shows that the points are along the horizontal line with no distinct patterns. The Actual vs Fitted plot shows that the points exhibit a linear relationship.Both the Residual vs Fitted and Actual vs Fitted plots indicate that the linearity assumption of the model has been satisfied.
The histogram plot of the residual reveals that they resemble a normal distribution.
The Q-Q Plot above shows that the model conforms to normal distribution for majority of the data since our number of observations are very large. Thus, the normality assumption of the model can be satisfied.
##
## Suggested power transformation: 0.2316132
The Spread-Level plot above shows that the variance of the residuals are generally homogenous in nature. Hence, we can conclude that the model satisfies the homoscedasticity assumption.
Export residual of linear regression model
Joining the data frame with brazil2 sf
The mapping of the residuals above suggests that might be spatial autocorrelation. To prove this observation, Global Moran’s I test would need to be performed.
Convert sf dataframe to an sp object
## class : SpatialPolygonsDataFrame
## features : 5554
## extent : 1552246, 6575781, 6030702, 10583412 (xmin, xmax, ymin, ymax)
## crs : +proj=merc +lon_0=-43 +lat_ts=-2 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## variables : 18
## names : GDP_CAPITA, ECONOMY_ACTIVE_RATIO, HDI_EDUCATION, HDI_LONGEVITY, TAX_GDP_RATIO, CROP_PRODUCTION, GVA_AGROPEC_CAPITA, GVA_INDUSTRY_CAPITA, GVA_PUBLIC_CAPITA, GVA_SERVICES_CAPITA, FIXED_PHONES_PR, TV_PR, CAR_PR, MOTORCYCLES_PR, TRACTOR_PR, ...
## min values : 3190.57, 0.0255745925616381, 0.207, 0.672, -2.09195095948827, 0, 0, 0.0947015518611966, 2.39651470307824, 0.460048426150121, 0.00046652670865407, 0.000194969779684149, 5.14628309703317e-05, 0.000102925661940663, 0, ...
## max values : 314637.69, 0.709886841723488, 0.825, 0.894, 324.668401779554, 3274885000, 120903.77975246, 253119.927858787, 14897.7042364311, 104596.72730638, 1.02307692307692, 0.496118560338744, 0.65576171875, 0.545236779067625, 0.0102649006622517, ...
Determine the cut-off distance
coords <- coordinates(brazil_res_sp)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = FALSE))
summary(k1dists)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1467 12482 16798 22292 24987 370668
Compute distance based weight matrix using a distance greater than the max obtained form the summary above.
## Neighbour list object:
## Number of regions: 5554
## Number of nonzero links: 2765424
## Percentage nonzero weights: 8.964993
## Average number of links: 497.9157
## Link number distribution:
##
## 1 3 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 1 1 1 2 5 3 2 8 12 7 6 4 9 7 6 3 3 9 5 10
## 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 46
## 4 4 7 6 4 3 3 1 2 2 5 2 4 4 5 2 1 6 2 7
## 47 48 49 50 51 52 53 54 55 56 57 58 59 61 62 63 64 65 66 67
## 5 5 8 3 6 9 7 4 10 9 6 8 7 2 4 4 2 2 4 5
## 68 69 70 71 72 73 74 75 76 77 78 80 81 82 83 85 86 88 89 90
## 6 2 2 4 1 4 1 7 6 3 4 1 9 3 2 3 1 2 3 3
## 93 94 95 97 98 99 100 101 102 103 104 105 107 108 109 110 111 112 113 114
## 1 1 2 2 1 1 2 4 3 3 1 3 4 3 1 3 4 2 3 2
## 115 116 117 118 119 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137
## 5 7 2 1 3 6 4 1 3 4 2 3 5 5 5 5 1 2 1 7
## 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
## 6 9 5 7 6 10 7 8 5 5 2 6 5 7 6 4 4 1 5 7
## 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
## 8 4 6 5 3 4 4 5 4 6 6 5 3 6 1 3 3 3 1 5
## 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 197 198
## 6 2 6 1 1 3 3 5 1 1 5 2 2 4 4 5 4 1 1 4
## 199 200 201 202 203 204 206 207 208 209 210 211 212 213 214 215 216 217 218 219
## 2 1 3 6 7 2 6 2 5 5 7 3 4 4 6 7 2 8 5 1
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
## 2 5 6 7 4 3 8 7 3 4 1 7 1 4 4 12 4 9 7 5
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 257 258 259 260
## 5 3 5 5 5 5 10 10 6 7 2 4 5 6 2 8 7 5 8 9
## 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
## 12 3 5 13 4 5 6 14 8 5 10 6 4 7 6 5 4 4 6 3
## 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
## 10 4 3 7 1 5 7 4 5 3 4 7 6 4 9 4 5 3 8 4
## 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
## 5 3 8 4 5 6 6 6 7 3 3 4 5 6 6 5 6 5 7 7
## 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## 12 6 3 4 6 5 12 2 3 9 2 6 4 5 10 6 2 9 5 6
## 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
## 8 6 4 7 6 8 5 8 11 11 8 6 9 5 4 7 10 7 4 11
## 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380
## 10 7 5 6 7 10 8 9 8 7 10 9 11 13 10 11 6 8 6 4
## 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
## 4 13 8 5 7 4 6 7 7 10 5 9 8 12 4 5 9 9 4 5
## 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420
## 10 7 4 6 4 4 4 10 10 6 8 4 6 4 5 6 11 4 15 9
## 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440
## 6 5 6 3 5 5 5 2 4 5 2 5 4 5 4 6 6 2 4 4
## 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460
## 5 9 7 4 6 7 4 3 5 5 2 9 3 4 7 6 8 7 5 4
## 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480
## 3 4 2 4 3 3 6 7 6 7 7 5 4 3 5 4 8 5 9 5
## 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500
## 3 4 8 7 5 2 8 4 12 5 7 5 12 3 4 4 6 4 12 5
## 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520
## 5 5 6 3 5 5 5 3 5 9 7 8 6 11 8 3 3 4 8 5
## 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540
## 5 7 8 6 3 3 7 9 4 7 5 9 10 6 6 7 5 8 6 5
## 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560
## 3 4 8 9 8 8 7 7 6 7 9 6 5 6 5 10 4 7 8 4
## 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580
## 7 10 14 11 6 4 8 10 13 7 7 9 15 14 9 6 12 7 13 8
## 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600
## 6 12 11 8 12 5 9 8 12 10 10 9 7 4 13 5 11 10 13 11
## 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
## 8 9 13 13 10 11 8 9 9 7 7 5 10 13 10 14 13 10 7 9
## 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640
## 8 14 15 8 13 11 13 7 14 9 8 7 14 12 10 9 14 12 11 13
## 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660
## 12 10 12 10 12 11 14 8 8 9 10 6 12 13 5 10 11 13 9 15
## 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680
## 6 10 13 14 8 13 5 8 7 12 6 10 10 9 13 10 11 15 12 11
## 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700
## 11 10 8 13 7 11 10 15 3 8 13 9 8 13 11 13 13 9 11 14
## 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720
## 9 10 8 8 13 13 10 5 12 10 7 12 12 7 11 9 6 10 8 8
## 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740
## 9 7 9 11 7 6 9 9 7 7 8 10 9 6 15 10 12 15 9 10
## 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760
## 10 9 7 6 12 13 9 10 8 11 8 8 6 8 8 9 9 12 14 9
## 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780
## 5 7 10 5 9 13 7 6 8 13 5 9 8 5 12 11 9 7 6 11
## 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800
## 13 10 12 6 7 7 9 8 4 5 7 8 8 8 11 11 4 10 8 6
## 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820
## 1 15 9 5 10 11 7 9 10 10 8 9 4 4 5 4 7 7 8 10
## 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840
## 6 4 15 6 10 5 6 12 4 6 7 5 4 5 6 4 5 7 3 6
## 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860
## 6 5 8 7 4 7 6 8 5 3 3 1 3 1 5 4 6 3 1 2
## 861 862 863 864 865
## 2 2 3 2 1
## 1 least connected region:
## 1519 with 1 link
## 1 most connected region:
## 1374 with 865 links
Converting output list into spatial weights
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5554
## Number of nonzero links: 2765424
## Percentage nonzero weights: 8.964993
## Average number of links: 497.9157
## Link number distribution:
##
## 1 3 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 1 1 1 2 5 3 2 8 12 7 6 4 9 7 6 3 3 9 5 10
## 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 46
## 4 4 7 6 4 3 3 1 2 2 5 2 4 4 5 2 1 6 2 7
## 47 48 49 50 51 52 53 54 55 56 57 58 59 61 62 63 64 65 66 67
## 5 5 8 3 6 9 7 4 10 9 6 8 7 2 4 4 2 2 4 5
## 68 69 70 71 72 73 74 75 76 77 78 80 81 82 83 85 86 88 89 90
## 6 2 2 4 1 4 1 7 6 3 4 1 9 3 2 3 1 2 3 3
## 93 94 95 97 98 99 100 101 102 103 104 105 107 108 109 110 111 112 113 114
## 1 1 2 2 1 1 2 4 3 3 1 3 4 3 1 3 4 2 3 2
## 115 116 117 118 119 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137
## 5 7 2 1 3 6 4 1 3 4 2 3 5 5 5 5 1 2 1 7
## 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
## 6 9 5 7 6 10 7 8 5 5 2 6 5 7 6 4 4 1 5 7
## 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
## 8 4 6 5 3 4 4 5 4 6 6 5 3 6 1 3 3 3 1 5
## 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 197 198
## 6 2 6 1 1 3 3 5 1 1 5 2 2 4 4 5 4 1 1 4
## 199 200 201 202 203 204 206 207 208 209 210 211 212 213 214 215 216 217 218 219
## 2 1 3 6 7 2 6 2 5 5 7 3 4 4 6 7 2 8 5 1
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
## 2 5 6 7 4 3 8 7 3 4 1 7 1 4 4 12 4 9 7 5
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 257 258 259 260
## 5 3 5 5 5 5 10 10 6 7 2 4 5 6 2 8 7 5 8 9
## 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
## 12 3 5 13 4 5 6 14 8 5 10 6 4 7 6 5 4 4 6 3
## 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
## 10 4 3 7 1 5 7 4 5 3 4 7 6 4 9 4 5 3 8 4
## 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
## 5 3 8 4 5 6 6 6 7 3 3 4 5 6 6 5 6 5 7 7
## 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## 12 6 3 4 6 5 12 2 3 9 2 6 4 5 10 6 2 9 5 6
## 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
## 8 6 4 7 6 8 5 8 11 11 8 6 9 5 4 7 10 7 4 11
## 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380
## 10 7 5 6 7 10 8 9 8 7 10 9 11 13 10 11 6 8 6 4
## 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
## 4 13 8 5 7 4 6 7 7 10 5 9 8 12 4 5 9 9 4 5
## 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420
## 10 7 4 6 4 4 4 10 10 6 8 4 6 4 5 6 11 4 15 9
## 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440
## 6 5 6 3 5 5 5 2 4 5 2 5 4 5 4 6 6 2 4 4
## 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460
## 5 9 7 4 6 7 4 3 5 5 2 9 3 4 7 6 8 7 5 4
## 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480
## 3 4 2 4 3 3 6 7 6 7 7 5 4 3 5 4 8 5 9 5
## 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500
## 3 4 8 7 5 2 8 4 12 5 7 5 12 3 4 4 6 4 12 5
## 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520
## 5 5 6 3 5 5 5 3 5 9 7 8 6 11 8 3 3 4 8 5
## 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540
## 5 7 8 6 3 3 7 9 4 7 5 9 10 6 6 7 5 8 6 5
## 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560
## 3 4 8 9 8 8 7 7 6 7 9 6 5 6 5 10 4 7 8 4
## 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580
## 7 10 14 11 6 4 8 10 13 7 7 9 15 14 9 6 12 7 13 8
## 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600
## 6 12 11 8 12 5 9 8 12 10 10 9 7 4 13 5 11 10 13 11
## 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
## 8 9 13 13 10 11 8 9 9 7 7 5 10 13 10 14 13 10 7 9
## 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640
## 8 14 15 8 13 11 13 7 14 9 8 7 14 12 10 9 14 12 11 13
## 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660
## 12 10 12 10 12 11 14 8 8 9 10 6 12 13 5 10 11 13 9 15
## 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680
## 6 10 13 14 8 13 5 8 7 12 6 10 10 9 13 10 11 15 12 11
## 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700
## 11 10 8 13 7 11 10 15 3 8 13 9 8 13 11 13 13 9 11 14
## 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720
## 9 10 8 8 13 13 10 5 12 10 7 12 12 7 11 9 6 10 8 8
## 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740
## 9 7 9 11 7 6 9 9 7 7 8 10 9 6 15 10 12 15 9 10
## 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760
## 10 9 7 6 12 13 9 10 8 11 8 8 6 8 8 9 9 12 14 9
## 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780
## 5 7 10 5 9 13 7 6 8 13 5 9 8 5 12 11 9 7 6 11
## 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800
## 13 10 12 6 7 7 9 8 4 5 7 8 8 8 11 11 4 10 8 6
## 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820
## 1 15 9 5 10 11 7 9 10 10 8 9 4 4 5 4 7 7 8 10
## 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840
## 6 4 15 6 10 5 6 12 4 6 7 5 4 5 6 4 5 7 3 6
## 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860
## 6 5 8 7 4 7 6 8 5 3 3 1 3 1 5 4 6 3 1 2
## 861 862 863 864 865
## 2 2 3 2 1
## 1 least connected region:
## 1519 with 1 link
## 1 most connected region:
## 1374 with 865 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 5554 30846916 2765424 5530848 6687081880
Performing Moran’s I test for spatial residual autocorrelation.
H0: Residual for regression model is randomly distributed.
H1: Residual for regression model is not randomly distributed.
The confidence interval is at 95%. The alpha-value is 0.05.
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO +
## FIXED_PHONES_PR + MOTORCYCLES_PR + CAR_PR + GVA_SERVICES_CAPITA +
## GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA +
## ECONOMY_ACTIVE_RATIO + CROP_PRODUCTION, data = brazil3)
## weights: nb_lw
##
## Moran I statistic standard deviate = 2.3593, p-value = 0.009156
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.195461e-03 -4.511130e-04 4.870948e-07
The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value of 0.00001433 is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed.
Since the Observed Global Moran’s I = 0.0025 which is marginally greater than 0, we can infer that the residuals resemble weak cluster distribution.
Standardising the regression coefficients and plotting on a barplot
plot_coeffs <- function(mlr_model) {
coeffs <- lm.beta(mlr_model)
mp <- barplot(sort(abs(coeffs$standardized.coefficients), decreasing = FALSE), col = "#3F97D0", xaxt = 'n', main = " Standardized Regression Coefficients")
lablist <- names(sort(abs(coeffs$standardized.coefficients), decreasing = FALSE))
text(mp, par("usr")[3], labels = lablist, srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex = 0.6)
}
plot_coeffs(brazil.mlr1)The above barplot of the standardised regression coefficients shows us which are the factors impacting GDP per capita the most for the Multiple Linear Regression model. The rank of importance is shown above, from right to left. The plot reveals that Gross Value Add of Industry per Capita, Gross Value Add of Services per Capita and Gross Value Add of Agropec per Capita are the variables which affect GDP per Capita the most.
Order of impact: 1) Gross Value Add of Industry per Capita 2) Gross Value Add of Services per Capita 3) Gross Value Add of Agropec per Capita 4) Crop Production Revenue 5) Car penetration rate 6) Gross Value Add of Public per Capita 7) Fixed Phones penetration rate 8) Human Development Index of Education 9) Tax to GDP Ratio 10) Motorcycles penetration rate 11) Economy Active Ratio
bw.fixed <- bw.gwr(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO + FIXED_PHONES_PR + MOTORCYCLES_PR
+ CAR_PR + GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA +
ECONOMY_ACTIVE_RATIO + CROP_PRODUCTION, data = brazil_res_sp, approach = "CV", kernel = "gaussian", adaptive = FALSE, longlat = FALSE)## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 3950062 CV score: 404074923255
## Fixed bandwidth: 2441761 CV score: 403738611579
## Fixed bandwidth: 1509579 CV score: 402976082346
## Fixed bandwidth: 933459.6 CV score: 401606912678
## Fixed bandwidth: 577397.9 CV score: 400347808787
## Fixed bandwidth: 357339.7 CV score: 399478534229
## Fixed bandwidth: 221336.3 CV score: 400991378864
## Fixed bandwidth: 441394.5 CV score: 399886581014
## Fixed bandwidth: 305391.1 CV score: 399358467661
## Fixed bandwidth: 273285 CV score: 399574460011
## Fixed bandwidth: 325233.7 CV score: 399360724808
## Fixed bandwidth: 293127.6 CV score: 399402768665
## Fixed bandwidth: 312970.3 CV score: 399349934389
## Fixed bandwidth: 317654.5 CV score: 399350797543
## Fixed bandwidth: 310075.3 CV score: 399351667518
## Fixed bandwidth: 314759.5 CV score: 399349747393
## Fixed bandwidth: 315865.3 CV score: 3.9935e+11
## Fixed bandwidth: 314076.1 CV score: 399349741383
## Fixed bandwidth: 313653.7 CV score: 399349785285
## Fixed bandwidth: 314337.1 CV score: 399349732495
## Fixed bandwidth: 314498.4 CV score: 399349733927
## Fixed bandwidth: 314237.4 CV score: 399349734252
## Fixed bandwidth: 314398.7 CV score: 399349732419
## Fixed bandwidth: 314436.8 CV score: 399349732757
## Fixed bandwidth: 314375.2 CV score: 399349732357
## Fixed bandwidth: 314360.6 CV score: 399349732375
## Fixed bandwidth: 314384.2 CV score: 399349732367
## Fixed bandwidth: 314369.6 CV score: 399349732359
## Fixed bandwidth: 314378.6 CV score: 399349732359
## Fixed bandwidth: 314373.1 CV score: 399349732357
## Fixed bandwidth: 314371.8 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
## Fixed bandwidth: 314374.4 CV score: 399349732357
## Fixed bandwidth: 314373.6 CV score: 399349732357
## Fixed bandwidth: 314374.1 CV score: 399349732357
## Fixed bandwidth: 314373.8 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
## Fixed bandwidth: 314373.8 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
## Fixed bandwidth: 314373.9 CV score: 399349732357
The result shows that the recommended bandwidth is 314373.9 metres.
gwr.fixed <- gwr.basic(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO + FIXED_PHONES_PR + MOTORCYCLES_PR
+ CAR_PR + GVA_SERVICES_CAPITA + GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA +
ECONOMY_ACTIVE_RATIO + CROP_PRODUCTION, data = brazil_res_sp, bw = bw.fixed, kernel = 'gaussian', adaptive = FALSE, longlat = FALSE)
gwr.fixed## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 21:12:35
## Call:
## gwr.basic(formula = GDP_CAPITA ~ HDI_EDUCATION + TAX_GDP_RATIO +
## FIXED_PHONES_PR + MOTORCYCLES_PR + CAR_PR + GVA_SERVICES_CAPITA +
## GVA_INDUSTRY_CAPITA + GVA_PUBLIC_CAPITA + GVA_AGROPEC_CAPITA +
## ECONOMY_ACTIVE_RATIO + CROP_PRODUCTION, data = brazil_res_sp,
## bw = bw.fixed, kernel = "gaussian", adaptive = FALSE, longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: HDI_EDUCATION TAX_GDP_RATIO FIXED_PHONES_PR MOTORCYCLES_PR CAR_PR GVA_SERVICES_CAPITA GVA_INDUSTRY_CAPITA GVA_PUBLIC_CAPITA GVA_AGROPEC_CAPITA ECONOMY_ACTIVE_RATIO CROP_PRODUCTION
## Number of data points: 5554
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31718 -2871 -1190 979 260381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.430e+02 9.696e+02 -0.973 0.33079
## HDI_EDUCATION 9.863e+03 2.237e+03 4.409 1.06e-05 ***
## TAX_GDP_RATIO 1.303e+01 4.450e+00 2.927 0.00343 **
## FIXED_PHONES_PR 1.407e+04 2.830e+03 4.970 6.89e-07 ***
## MOTORCYCLES_PR -3.375e+03 1.767e+03 -1.910 0.05624 .
## CAR_PR 6.893e+03 1.418e+03 4.862 1.20e-06 ***
## GVA_SERVICES_CAPITA 1.004e+00 2.247e-02 44.668 < 2e-16 ***
## GVA_INDUSTRY_CAPITA 1.141e+00 1.283e-02 88.991 < 2e-16 ***
## GVA_PUBLIC_CAPITA 4.805e-01 5.992e-02 8.019 1.29e-15 ***
## GVA_AGROPEC_CAPITA 8.902e-01 2.092e-02 42.545 < 2e-16 ***
## ECONOMY_ACTIVE_RATIO -4.284e+02 1.087e+03 -0.394 0.69362
## CROP_PRODUCTION 6.745e-06 8.573e-07 7.868 4.29e-15 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 8487 on 5542 degrees of freedom
## Multiple R-squared: 0.8241
## Adjusted R-squared: 0.8238
## F-statistic: 2361 on 11 and 5542 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 399174393644
## Sigma(hat): 8479.234
## AIC: 116261.6
## AICc: 116261.7
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 314373.9
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -7.6082e+03 -1.9627e+03 1.0177e+02 2.2388e+03
## HDI_EDUCATION -1.1020e+04 3.6314e+03 6.9588e+03 1.2913e+04
## TAX_GDP_RATIO -4.1057e+01 4.7110e+00 1.1369e+01 1.6731e+01
## FIXED_PHONES_PR -6.9677e+02 1.0266e+04 1.3975e+04 2.0239e+04
## MOTORCYCLES_PR -1.9635e+04 -1.0399e+04 -5.9461e+03 -2.1926e+03
## CAR_PR -5.7283e+04 4.3780e+03 9.0509e+03 1.2540e+04
## GVA_SERVICES_CAPITA -1.1925e-01 8.1422e-01 9.5003e-01 1.0353e+00
## GVA_INDUSTRY_CAPITA 1.6049e-01 1.1026e+00 1.1950e+00 1.2401e+00
## GVA_PUBLIC_CAPITA -1.3503e-01 2.3471e-01 3.1833e-01 4.5668e-01
## GVA_AGROPEC_CAPITA 4.5469e-01 8.1340e-01 8.6809e-01 9.3042e-01
## ECONOMY_ACTIVE_RATIO -1.6861e+04 -3.5677e+02 1.0750e+03 2.3356e+03
## CROP_PRODUCTION -2.8993e-06 2.0377e-06 4.0250e-06 5.9686e-06
## Max.
## Intercept 1.3774e+04
## HDI_EDUCATION 2.7730e+04
## TAX_GDP_RATIO 3.3510e+01
## FIXED_PHONES_PR 1.2153e+05
## MOTORCYCLES_PR 3.1469e+04
## CAR_PR 6.3061e+04
## GVA_SERVICES_CAPITA 1.8108e+00
## GVA_INDUSTRY_CAPITA 2.3499e+00
## GVA_PUBLIC_CAPITA 9.3790e-01
## GVA_AGROPEC_CAPITA 1.0882e+00
## ECONOMY_ACTIVE_RATIO 5.3148e+03
## CROP_PRODUCTION 0.0000e+00
## ************************Diagnostic information*************************
## Number of data points: 5554
## Effective number of parameters (2trace(S) - trace(S'S)): 224.7197
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5329.28
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 116053.8
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 115878.5
## Residual sum of squares: 363481349413
## R-square value: 0.8398709
## Adjusted R-square value: 0.8331175
##
## ***********************************************************************
## Program stops at: 2020-05-31 21:12:56
The Geographically Weighted Regression model has an adjusted R2 value of 0.8331 which is marginally better than the Multiple Linear Regression model’s adjusted R2 value of 0.8238. The GWR model also has a lower AICc value of 116053.8 than MLR model’s AICc value of 116261.7, suggesting that the GWR model is a better fit than the MLM model.
Creating a dataframe with the mean parameter estimates
mean_coeffs <- c(mean(gwr.fixed$SDF[[2]]), mean(gwr.fixed$SDF[[3]]), mean(gwr.fixed$SDF[[4]]), mean(gwr.fixed$SDF[[5]]), mean(gwr.fixed$SDF[[6]]), mean(gwr.fixed$SDF[[7]]), mean(gwr.fixed$SDF[[8]]), mean(gwr.fixed$SDF[[9]]), mean(gwr.fixed$SDF[[10]]), mean(gwr.fixed$SDF[[11]]), mean(gwr.fixed$SDF[[12]]))
naam <- c("HDI_EDUCATION", "TAX_GDP_RATIO", "FIXED_PHONES_PR", "MOTORCYCLES_PR", "CAR_PR", "GVA_SERVICE_CAPITA", "GVA_INDUSTRY_CAPITA", "GVA_PUBLIC_CAPITA", "GVA_AGROPEC_CAPITA", "ECONOMY_ACTIVE_RATIO", "CROP_PRODUCTION")
new <- data.frame(naam, abs(scale(mean_coeffs)), stringsAsFactors = FALSE)Standardising the regression coefficients and plotting on a barplot
plot_coeffs <- function(df) {
mp <- barplot(df[,2], col = "#3F97D0", xaxt = 'n', main = " Standardized Regression Coefficients")
text(mp, par("usr")[3], labels = df[,1] , srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex = 0.6)
}
plot_coeffs(new)The above barplot of the standardised regression coefficients shows us which are the factors impacting GDP per capita the most for the Geographically Weighted Regression Model. The plot reveals that contrary to the initial Multiple Linear Regression model, the Fixed Phones penetration rate, Motorcycles penetration rate and Human Development Index of Education affected the GDP per Capita the most.
Order of impact:
To visualise the fields in SDF, we need to first covert it into sf data.frame by using the code chunk below.
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
## [1] "sf" "data.frame"
## Simple feature collection with 5554 features and 42 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1552246 ymin: 6030702 xmax: 6575781 ymax: 10583410
## CRS: unknown
## First 10 features:
## Intercept HDI_EDUCATION TAX_GDP_RATIO FIXED_PHONES_PR MOTORCYCLES_PR
## 1 10954.159 172.4247 -31.476914 44676.99 -14174.661
## 2 10196.098 -4212.7457 -13.498880 57466.91 -9835.845
## 3 11154.227 6880.1438 -41.056785 40914.20 -16205.920
## 4 11935.386 -4092.2564 -30.392111 36108.76 -15822.594
## 5 11053.260 4402.1239 -40.719733 39620.79 -15780.344
## 6 11527.793 5184.3945 -40.642807 39974.83 -16454.918
## 7 11404.985 3112.1639 -40.255860 38774.29 -16071.823
## 8 9005.498 2409.1453 -12.561993 65088.11 -8913.356
## 9 12280.928 -3736.6189 -32.360044 32611.87 -16916.690
## 10 8445.486 2213.2301 -5.485522 69488.56 -6573.313
## CAR_PR GVA_SERVICES_CAPITA GVA_INDUSTRY_CAPITA GVA_PUBLIC_CAPITA
## 1 -35737.76 1.603288 2.260098 0.5698952
## 2 -10228.80 1.221938 2.220803 0.6008003
## 3 -47479.56 1.739639 2.123274 0.2882936
## 4 -16266.29 1.578225 2.234340 0.5650180
## 5 -43803.21 1.718766 2.177504 0.4076405
## 6 -43024.69 1.721513 2.140551 0.3060952
## 7 -39695.01 1.704993 2.184546 0.4084554
## 8 -37505.03 1.303776 2.349852 0.5621602
## 9 -14922.90 1.603932 2.212873 0.5115247
## 10 -27626.87 1.085990 2.320830 0.4916841
## GVA_AGROPEC_CAPITA ECONOMY_ACTIVE_RATIO CROP_PRODUCTION y yhat
## 1 0.6843359 -7539.600 5.431718e-06 18732.17 17001.499
## 2 0.6194032 -8695.373 1.019433e-05 20618.18 14492.891
## 3 0.7687932 -9652.600 3.428926e-06 21202.96 20555.073
## 4 0.7170620 -7892.698 4.723564e-06 22130.78 24927.210
## 5 0.7430808 -8564.552 3.840570e-06 22721.08 24789.847
## 6 0.7696649 -9434.856 3.371903e-06 16410.55 11972.815
## 7 0.7471046 -8569.085 3.747556e-06 27040.60 29787.563
## 8 0.6057575 -8227.085 9.596858e-06 12102.82 11531.323
## 9 0.7442763 -8040.357 3.968270e-06 16650.27 17557.060
## 10 0.5889294 -8825.693 1.239914e-05 15732.01 8463.108
## residual CV_Score Stud_residual Intercept_SE HDI_EDUCATION_SE
## 1 1730.6711 0 0.22618396 9599.383 22260.68
## 2 6125.2894 0 0.90489398 9384.807 21458.47
## 3 647.8870 0 0.08509723 8621.276 19456.41
## 4 -2796.4300 0 -0.36050367 8902.027 20397.56
## 5 -2068.7665 0 -0.26724254 8945.367 20364.71
## 6 4437.7354 0 0.56873955 8488.416 19175.42
## 7 -2746.9626 0 -0.36146002 8794.908 20014.91
## 8 571.4975 0 0.08244513 10324.953 24856.37
## 9 -906.7896 0 -0.11577234 8520.737 19454.54
## 10 7268.9019 0 1.10904302 10063.591 24313.26
## TAX_GDP_RATIO_SE FIXED_PHONES_PR_SE MOTORCYCLES_PR_SE CAR_PR_SE
## 1 31.81143 54399.21 10812.106 39304.99
## 2 38.01403 68431.01 12369.362 45610.19
## 3 31.63683 40591.21 9066.132 30398.14
## 4 31.02281 51739.51 10375.957 35958.87
## 5 31.29623 44543.40 9636.084 33447.09
## 6 30.80224 40649.05 9106.458 30276.86
## 7 30.68438 44198.14 9601.494 32927.62
## 8 37.29823 68348.31 12593.327 48206.72
## 9 29.93757 47572.43 9893.325 33312.04
## 10 39.99362 72242.61 13323.530 50533.12
## GVA_SERVICES_CAPITA_SE GVA_INDUSTRY_CAPITA_SE GVA_PUBLIC_CAPITA_SE
## 1 0.2056554 0.2699172 0.4147363
## 2 0.2906447 0.3647075 0.4545954
## 3 0.1546364 0.2110185 0.3452859
## 4 0.1837050 0.2571417 0.3738385
## 5 0.1677861 0.2279474 0.3660964
## 6 0.1537065 0.2114465 0.3412041
## 7 0.1650308 0.2259451 0.3597475
## 8 0.3013138 0.3624579 0.4882068
## 9 0.1670501 0.2387749 0.3480715
## 10 0.3502117 0.4099543 0.4999573
## GVA_AGROPEC_CAPITA_SE ECONOMY_ACTIVE_RATIO_SE CROP_PRODUCTION_SE
## 1 0.12166143 10020.357 3.347766e-06
## 2 0.17710249 10511.741 5.020821e-06
## 3 0.07663927 8898.762 2.289406e-06
## 4 0.10020572 9443.184 2.826054e-06
## 5 0.08759679 9271.365 2.531778e-06
## 6 0.07634677 8811.117 2.279590e-06
## 7 0.08542574 9155.312 2.480917e-06
## 8 0.19523319 10926.989 5.488902e-06
## 9 0.08718054 9033.505 2.517253e-06
## 10 0.21882724 11059.638 6.453407e-06
## Intercept_TV HDI_EDUCATION_TV TAX_GDP_RATIO_TV FIXED_PHONES_PR_TV
## 1 1.1411316 0.007745705 -0.9894845 0.8212801
## 2 1.0864473 -0.196320896 -0.3551026 0.8397788
## 3 1.2938022 0.353618312 -1.2977529 1.0079570
## 4 1.3407492 -0.200624821 -0.9796700 0.6978953
## 5 1.2356407 0.216164368 -1.3011067 0.8894872
## 6 1.3580617 0.270366614 -1.3194757 0.9834137
## 7 1.2967714 0.155492279 -1.3119332 0.8772834
## 8 0.8722072 0.096922644 -0.3367987 0.9523003
## 9 1.4412988 -0.192069261 -1.0809175 0.6855204
## 10 0.8392119 0.091029752 -0.1371599 0.9618777
## MOTORCYCLES_PR_TV CAR_PR_TV GVA_SERVICES_CAPITA_TV GVA_INDUSTRY_CAPITA_TV
## 1 -1.3109991 -0.9092424 7.795989 8.373301
## 2 -0.7951780 -0.2242656 4.204234 6.089273
## 3 -1.7875230 -1.5619231 11.249869 10.062028
## 4 -1.5249286 -0.4523583 8.591084 8.689139
## 5 -1.6376305 -1.3096271 10.243795 9.552663
## 6 -1.8069505 -1.4210419 11.200003 10.123373
## 7 -1.6738877 -1.2055234 10.331361 9.668480
## 8 -0.7077840 -0.7780040 4.326969 6.483102
## 9 -1.7099095 -0.4479731 9.601502 9.267613
## 10 -0.4933612 -0.5467082 3.100954 5.661192
## GVA_PUBLIC_CAPITA_TV GVA_AGROPEC_CAPITA_TV ECONOMY_ACTIVE_RATIO_TV
## 1 1.3741147 5.624921 -0.7524283
## 2 1.3216153 3.497428 -0.8272059
## 3 0.8349416 10.031322 -1.0847126
## 4 1.5113960 7.155899 -0.8358089
## 5 1.1134785 8.482968 -0.9237639
## 6 0.8971030 10.081171 -1.0707901
## 7 1.1353949 8.745662 -0.9359687
## 8 1.1514796 3.102738 -0.7529142
## 9 1.4695968 8.537183 -0.8900594
## 10 0.9834521 2.691298 -0.7980092
## CROP_PRODUCTION_TV Local_R2 geometry
## 1 1.622490 0.9434708 MULTIPOLYGON (((2864555 868...
## 2 2.030411 0.9196799 MULTIPOLYGON (((2826525 891...
## 3 1.497736 0.9520534 MULTIPOLYGON (((3067467 850...
## 4 1.671434 0.9430179 MULTIPOLYGON (((2997373 874...
## 5 1.516946 0.9502375 MULTIPOLYGON (((2941862 855...
## 6 1.479170 0.9517036 MULTIPOLYGON (((3052559 855...
## 7 1.510553 0.9500395 MULTIPOLYGON (((2959332 858...
## 8 1.748411 0.9247976 MULTIPOLYGON (((2695747 870...
## 9 1.576429 0.9455581 MULTIPOLYGON (((3003217 877...
## 10 1.921332 0.9101115 MULTIPOLYGON (((2510484 884...
Assigning a CRS
## Simple feature collection with 5554 features and 42 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1552246 ymin: 6030702 xmax: 6575781 ymax: 10583410
## CRS: EPSG:5641
## First 10 features:
## Intercept HDI_EDUCATION TAX_GDP_RATIO FIXED_PHONES_PR MOTORCYCLES_PR
## 1 10954.159 172.4247 -31.476914 44676.99 -14174.661
## 2 10196.098 -4212.7457 -13.498880 57466.91 -9835.845
## 3 11154.227 6880.1438 -41.056785 40914.20 -16205.920
## 4 11935.386 -4092.2564 -30.392111 36108.76 -15822.594
## 5 11053.260 4402.1239 -40.719733 39620.79 -15780.344
## 6 11527.793 5184.3945 -40.642807 39974.83 -16454.918
## 7 11404.985 3112.1639 -40.255860 38774.29 -16071.823
## 8 9005.498 2409.1453 -12.561993 65088.11 -8913.356
## 9 12280.928 -3736.6189 -32.360044 32611.87 -16916.690
## 10 8445.486 2213.2301 -5.485522 69488.56 -6573.313
## CAR_PR GVA_SERVICES_CAPITA GVA_INDUSTRY_CAPITA GVA_PUBLIC_CAPITA
## 1 -35737.76 1.603288 2.260098 0.5698952
## 2 -10228.80 1.221938 2.220803 0.6008003
## 3 -47479.56 1.739639 2.123274 0.2882936
## 4 -16266.29 1.578225 2.234340 0.5650180
## 5 -43803.21 1.718766 2.177504 0.4076405
## 6 -43024.69 1.721513 2.140551 0.3060952
## 7 -39695.01 1.704993 2.184546 0.4084554
## 8 -37505.03 1.303776 2.349852 0.5621602
## 9 -14922.90 1.603932 2.212873 0.5115247
## 10 -27626.87 1.085990 2.320830 0.4916841
## GVA_AGROPEC_CAPITA ECONOMY_ACTIVE_RATIO CROP_PRODUCTION y yhat
## 1 0.6843359 -7539.600 5.431718e-06 18732.17 17001.499
## 2 0.6194032 -8695.373 1.019433e-05 20618.18 14492.891
## 3 0.7687932 -9652.600 3.428926e-06 21202.96 20555.073
## 4 0.7170620 -7892.698 4.723564e-06 22130.78 24927.210
## 5 0.7430808 -8564.552 3.840570e-06 22721.08 24789.847
## 6 0.7696649 -9434.856 3.371903e-06 16410.55 11972.815
## 7 0.7471046 -8569.085 3.747556e-06 27040.60 29787.563
## 8 0.6057575 -8227.085 9.596858e-06 12102.82 11531.323
## 9 0.7442763 -8040.357 3.968270e-06 16650.27 17557.060
## 10 0.5889294 -8825.693 1.239914e-05 15732.01 8463.108
## residual CV_Score Stud_residual Intercept_SE HDI_EDUCATION_SE
## 1 1730.6711 0 0.22618396 9599.383 22260.68
## 2 6125.2894 0 0.90489398 9384.807 21458.47
## 3 647.8870 0 0.08509723 8621.276 19456.41
## 4 -2796.4300 0 -0.36050367 8902.027 20397.56
## 5 -2068.7665 0 -0.26724254 8945.367 20364.71
## 6 4437.7354 0 0.56873955 8488.416 19175.42
## 7 -2746.9626 0 -0.36146002 8794.908 20014.91
## 8 571.4975 0 0.08244513 10324.953 24856.37
## 9 -906.7896 0 -0.11577234 8520.737 19454.54
## 10 7268.9019 0 1.10904302 10063.591 24313.26
## TAX_GDP_RATIO_SE FIXED_PHONES_PR_SE MOTORCYCLES_PR_SE CAR_PR_SE
## 1 31.81143 54399.21 10812.106 39304.99
## 2 38.01403 68431.01 12369.362 45610.19
## 3 31.63683 40591.21 9066.132 30398.14
## 4 31.02281 51739.51 10375.957 35958.87
## 5 31.29623 44543.40 9636.084 33447.09
## 6 30.80224 40649.05 9106.458 30276.86
## 7 30.68438 44198.14 9601.494 32927.62
## 8 37.29823 68348.31 12593.327 48206.72
## 9 29.93757 47572.43 9893.325 33312.04
## 10 39.99362 72242.61 13323.530 50533.12
## GVA_SERVICES_CAPITA_SE GVA_INDUSTRY_CAPITA_SE GVA_PUBLIC_CAPITA_SE
## 1 0.2056554 0.2699172 0.4147363
## 2 0.2906447 0.3647075 0.4545954
## 3 0.1546364 0.2110185 0.3452859
## 4 0.1837050 0.2571417 0.3738385
## 5 0.1677861 0.2279474 0.3660964
## 6 0.1537065 0.2114465 0.3412041
## 7 0.1650308 0.2259451 0.3597475
## 8 0.3013138 0.3624579 0.4882068
## 9 0.1670501 0.2387749 0.3480715
## 10 0.3502117 0.4099543 0.4999573
## GVA_AGROPEC_CAPITA_SE ECONOMY_ACTIVE_RATIO_SE CROP_PRODUCTION_SE
## 1 0.12166143 10020.357 3.347766e-06
## 2 0.17710249 10511.741 5.020821e-06
## 3 0.07663927 8898.762 2.289406e-06
## 4 0.10020572 9443.184 2.826054e-06
## 5 0.08759679 9271.365 2.531778e-06
## 6 0.07634677 8811.117 2.279590e-06
## 7 0.08542574 9155.312 2.480917e-06
## 8 0.19523319 10926.989 5.488902e-06
## 9 0.08718054 9033.505 2.517253e-06
## 10 0.21882724 11059.638 6.453407e-06
## Intercept_TV HDI_EDUCATION_TV TAX_GDP_RATIO_TV FIXED_PHONES_PR_TV
## 1 1.1411316 0.007745705 -0.9894845 0.8212801
## 2 1.0864473 -0.196320896 -0.3551026 0.8397788
## 3 1.2938022 0.353618312 -1.2977529 1.0079570
## 4 1.3407492 -0.200624821 -0.9796700 0.6978953
## 5 1.2356407 0.216164368 -1.3011067 0.8894872
## 6 1.3580617 0.270366614 -1.3194757 0.9834137
## 7 1.2967714 0.155492279 -1.3119332 0.8772834
## 8 0.8722072 0.096922644 -0.3367987 0.9523003
## 9 1.4412988 -0.192069261 -1.0809175 0.6855204
## 10 0.8392119 0.091029752 -0.1371599 0.9618777
## MOTORCYCLES_PR_TV CAR_PR_TV GVA_SERVICES_CAPITA_TV GVA_INDUSTRY_CAPITA_TV
## 1 -1.3109991 -0.9092424 7.795989 8.373301
## 2 -0.7951780 -0.2242656 4.204234 6.089273
## 3 -1.7875230 -1.5619231 11.249869 10.062028
## 4 -1.5249286 -0.4523583 8.591084 8.689139
## 5 -1.6376305 -1.3096271 10.243795 9.552663
## 6 -1.8069505 -1.4210419 11.200003 10.123373
## 7 -1.6738877 -1.2055234 10.331361 9.668480
## 8 -0.7077840 -0.7780040 4.326969 6.483102
## 9 -1.7099095 -0.4479731 9.601502 9.267613
## 10 -0.4933612 -0.5467082 3.100954 5.661192
## GVA_PUBLIC_CAPITA_TV GVA_AGROPEC_CAPITA_TV ECONOMY_ACTIVE_RATIO_TV
## 1 1.3741147 5.624921 -0.7524283
## 2 1.3216153 3.497428 -0.8272059
## 3 0.8349416 10.031322 -1.0847126
## 4 1.5113960 7.155899 -0.8358089
## 5 1.1134785 8.482968 -0.9237639
## 6 0.8971030 10.081171 -1.0707901
## 7 1.1353949 8.745662 -0.9359687
## 8 1.1514796 3.102738 -0.7529142
## 9 1.4695968 8.537183 -0.8900594
## 10 0.9834521 2.691298 -0.7980092
## CROP_PRODUCTION_TV Local_R2 geometry
## 1 1.622490 0.9434708 MULTIPOLYGON (((2864555 868...
## 2 2.030411 0.9196799 MULTIPOLYGON (((2826525 891...
## 3 1.497736 0.9520534 MULTIPOLYGON (((3067467 850...
## 4 1.671434 0.9430179 MULTIPOLYGON (((2997373 874...
## 5 1.516946 0.9502375 MULTIPOLYGON (((2941862 855...
## 6 1.479170 0.9517036 MULTIPOLYGON (((3052559 855...
## 7 1.510553 0.9500395 MULTIPOLYGON (((2959332 858...
## 8 1.748411 0.9247976 MULTIPOLYGON (((2695747 870...
## 9 1.576429 0.9455581 MULTIPOLYGON (((3003217 877...
## 10 1.921332 0.9101115 MULTIPOLYGON (((2510484 884...
## Intercept HDI_EDUCATION TAX_GDP_RATIO FIXED_PHONES_PR
## Min. :-7608.2 Min. :-11020 Min. :-41.057 Min. : -696.8
## 1st Qu.:-1962.7 1st Qu.: 3631 1st Qu.: 4.711 1st Qu.: 10265.8
## Median : 101.8 Median : 6959 Median : 11.369 Median : 13974.6
## Mean : -173.8 Mean : 8643 Mean : 11.216 Mean : 17340.7
## 3rd Qu.: 2238.8 3rd Qu.: 12913 3rd Qu.: 16.731 3rd Qu.: 20239.4
## Max. :13774.1 Max. : 27730 Max. : 33.510 Max. :121530.3
## MOTORCYCLES_PR CAR_PR GVA_SERVICES_CAPITA GVA_INDUSTRY_CAPITA
## Min. :-19635 Min. :-57283 Min. :-0.1193 Min. :0.1605
## 1st Qu.:-10399 1st Qu.: 4378 1st Qu.: 0.8142 1st Qu.:1.1026
## Median : -5946 Median : 9051 Median : 0.9500 Median :1.1950
## Mean : -5629 Mean : 8039 Mean : 0.9157 Mean :1.1938
## 3rd Qu.: -2193 3rd Qu.: 12540 3rd Qu.: 1.0353 3rd Qu.:1.2401
## Max. : 31469 Max. : 63061 Max. : 1.8108 Max. :2.3499
## GVA_PUBLIC_CAPITA GVA_AGROPEC_CAPITA ECONOMY_ACTIVE_RATIO CROP_PRODUCTION
## Min. :-0.1350 Min. :0.4547 Min. :-16860.9 Min. :-2.899e-06
## 1st Qu.: 0.2347 1st Qu.:0.8134 1st Qu.: -356.8 1st Qu.: 2.038e-06
## Median : 0.3183 Median :0.8681 Median : 1075.0 Median : 4.025e-06
## Mean : 0.3816 Mean :0.8704 Mean : 831.5 Mean : 3.857e-06
## 3rd Qu.: 0.4567 3rd Qu.:0.9304 3rd Qu.: 2335.6 3rd Qu.: 5.969e-06
## Max. : 0.9379 Max. :1.0882 Max. : 5314.8 Max. : 3.079e-05
## y yhat residual CV_Score
## Min. : 3191 Min. : 3459 Min. :-24526.40 Min. :0
## 1st Qu.: 9063 1st Qu.: 9326 1st Qu.: -2660.09 1st Qu.:0
## Median : 15877 Median : 16176 Median : -885.28 Median :0
## Mean : 21104 Mean : 21126 Mean : -22.15 Mean :0
## 3rd Qu.: 26156 3rd Qu.: 26878 3rd Qu.: 828.13 3rd Qu.:0
## Max. :314638 Max. :310778 Max. :258102.13 Max. :0
## Stud_residual Intercept_SE HDI_EDUCATION_SE TAX_GDP_RATIO_SE
## Min. :-3.335276 Min. : 1988 Min. : 4382 Min. : 6.854
## 1st Qu.:-0.327057 1st Qu.: 2336 1st Qu.: 4906 1st Qu.: 8.147
## Median :-0.108760 Median : 2510 Median : 5394 Median : 11.153
## Mean : 0.000472 Mean : 2873 Mean : 6203 Mean : 12.593
## 3rd Qu.: 0.103977 3rd Qu.: 2797 3rd Qu.: 5999 3rd Qu.: 13.555
## Max. :31.414135 Max. :16685 Max. :34803 Max. :142.943
## FIXED_PHONES_PR_SE MOTORCYCLES_PR_SE CAR_PR_SE GVA_SERVICES_CAPITA_SE
## Min. : 3727 Min. : 3750 Min. : 3173 Min. :0.02785
## 1st Qu.: 4663 1st Qu.: 4390 1st Qu.: 3684 1st Qu.:0.03471
## Median : 6782 Median : 4803 Median : 4848 Median :0.05455
## Mean : 14271 Mean : 5572 Mean : 9615 Mean :0.10203
## 3rd Qu.: 15479 3rd Qu.: 5413 3rd Qu.: 10567 3rd Qu.:0.14586
## Max. :289242 Max. :75757 Max. :219704 Max. :1.98576
## GVA_INDUSTRY_CAPITA_SE GVA_PUBLIC_CAPITA_SE GVA_AGROPEC_CAPITA_SE
## Min. :0.01623 Min. :0.1133 Min. :0.03220
## 1st Qu.:0.02123 1st Qu.:0.1260 1st Qu.:0.03967
## Median :0.02983 Median :0.1455 Median :0.06162
## Mean :0.06740 Mean :0.1636 Mean :0.08382
## 3rd Qu.:0.06302 3rd Qu.:0.1667 3rd Qu.:0.10581
## Max. :5.45312 Max. :1.2735 Max. :1.06179
## ECONOMY_ACTIVE_RATIO_SE CROP_PRODUCTION_SE Intercept_TV
## Min. : 1945 Min. :1.405e-06 Min. :-2.74350
## 1st Qu.: 2364 1st Qu.:2.041e-06 1st Qu.:-0.78465
## Median : 2702 Median :2.628e-06 Median : 0.03684
## Mean : 3314 Mean :3.481e-06 Mean :-0.12674
## 3rd Qu.: 3079 3rd Qu.:3.534e-06 3rd Qu.: 0.89654
## Max. :37550 Max. :1.215e-04 Max. : 2.01288
## HDI_EDUCATION_TV TAX_GDP_RATIO_TV FIXED_PHONES_PR_TV MOTORCYCLES_PR_TV
## Min. :-0.6788 Min. :-1.3195 Min. :-0.1034 Min. :-3.4033
## 1st Qu.: 0.6758 1st Qu.: 0.3383 1st Qu.: 0.9298 1st Qu.:-2.1195
## Median : 1.3238 Median : 1.0885 Median : 1.8607 Median :-1.2076
## Mean : 1.5615 Mean : 1.1291 Mean : 1.8872 Mean :-1.1948
## 3rd Qu.: 2.3293 3rd Qu.: 1.6138 3rd Qu.: 2.8543 3rd Qu.:-0.4125
## Max. : 4.3177 Max. : 3.4464 Max. : 4.3660 Max. : 1.2385
## CAR_PR_TV GVA_SERVICES_CAPITA_TV GVA_INDUSTRY_CAPITA_TV
## Min. :-2.9099 Min. :-0.09576 Min. : 0.03806
## 1st Qu.: 0.3976 1st Qu.: 6.68002 1st Qu.:19.41404
## Median : 1.1671 Median :13.46143 Median :40.93876
## Mean : 1.5616 Mean :16.76509 Mean :37.13200
## 3rd Qu.: 2.7332 3rd Qu.:27.23641 3rd Qu.:53.30040
## Max. : 5.0161 Max. :37.02155 Max. :69.35577
## GVA_PUBLIC_CAPITA_TV GVA_AGROPEC_CAPITA_TV ECONOMY_ACTIVE_RATIO_TV
## Min. :-0.4434 Min. : 0.6022 Min. :-1.5165
## 1st Qu.: 1.4851 1st Qu.: 8.7650 1st Qu.:-0.1015
## Median : 1.9219 Median :13.4109 Median : 0.3749
## Mean : 2.6895 Mean :14.6121 Mean : 0.4154
## 3rd Qu.: 3.1238 3rd Qu.:20.5533 3rd Qu.: 0.8835
## Max. : 7.7826 Max. :25.7997 Max. : 2.2270
## CROP_PRODUCTION_TV Local_R2 geometry
## Min. :-1.1902 Min. :0.7231 MULTIPOLYGON :5554
## 1st Qu.: 0.6101 1st Qu.:0.8019 epsg:5641 : 0
## Median : 1.1075 Median :0.8429 +proj=merc...: 0
## Mean : 1.4155 Mean :0.8641
## 3rd Qu.: 2.2487 3rd Qu.:0.9527
## Max. : 5.9771 Max. :0.9971
brazil_sf_fixed <- brazil_sf_fixed %>%
mutate(CROP_PRODUCTION_T = abs(CROP_PRODUCTION_TV),
ECONOMY_ACTIVE_RATIO_T = abs(ECONOMY_ACTIVE_RATIO_TV),
GVA_AGROPEC_CAPITA_T = abs(GVA_AGROPEC_CAPITA_TV),
GVA_PUBLIC_CAPITA_T = abs(GVA_PUBLIC_CAPITA_TV),
GVA_INDUSTRY_CAPITA_T = abs(GVA_INDUSTRY_CAPITA_TV),
GVA_SERVICES_CAPITA_T = abs(GVA_SERVICES_CAPITA_TV),
CAR_PR_T = abs(CAR_PR_TV),
FIXED_PHONES_PR_T = abs(FIXED_PHONES_PR_TV),
TAX_GDP_RATIO_T = abs(TAX_GDP_RATIO_TV),
HDI_EDUCATION_T = abs(HDI_EDUCATION_TV),
Intercept_T = abs(Intercept_TV))The map above shows how the local regression model fits observed y values. The GWR model seems to be performing well in most of the municipalities, especially in the North of Brazil. However, slightly above the South of Brazil there seems to be a cluster of low R2 values, meaning that the model does not perform as well for those municipalities. This means there might be important variables missing from our regression model.
The map above shows the distribution of the residuals of GDP per Capita. The residuals seem to be of relatively low values which indicate that the model seems to be working well.
v1 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "yhat", border.alpha = 0.15)
v2 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "y", border.alpha = 0.15)
tmap_arrange(v1, v2, ncol = 2)The maps of local observed and predicted values are corresponding. Therefore, it means the GWR model is working well.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "CROP_PRODUCTION", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "CROP_PRODUCTION_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The CROP_PRODUCTION is significant in the Central portion of Brazil. The general trend is that a unit increase in CROP_PRODUCTION causes an increase in GDP per Capita.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "ECONOMY_ACTIVE_RATIO", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "ECONOMY_ACTIVE_RATIO_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)Only a small proportion of Brazil municipalities located in between the Central and South of Brazil are significant for the Economy Active Ratio. That area shows that an increase in Economy Active Ratio leads to an increase in GDP per Capita.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_AGROPEC_CAPITA", border.alpha = 0.15, palette = "Reds")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_AGROPEC_CAPITA_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The GVA_AGROPEC_CAPITA is significant in a large proportion of Brazil. The general trend is that a unit increase in GVA_AGROPEC_CAPITA causes an increase in GDP per Capita. The estimate is higher in the North, suggesting greater Agropecuary activity.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_PUBLIC_CAPITA", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_PUBLIC_CAPITA_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The GVA_PUBLIC_CAPITA is significant in a smaller proportion of Brazil. The general trend is that a unit increase in GVA_PUBLIC_CAPITA causes an increase in GDP per Capita. The estimate is higher in the South, suggesting greater Public activity.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_INDUSTRY_CAPITA", border.alpha = 0.15, palette = "Reds")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_INDUSTRY_CAPITA_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The GVA_INDUSTRY_CAPITA is significant in a large proportion of Brazil. The general trend is that a unit increase in GVA_INDUSTRY_CAPITA causes an increase in GDP per Capita. The estimate is higher in the West, suggesting greater Industrial activity.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_SERVICES_CAPITA", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "GVA_SERVICES_CAPITA_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The GVA_SERVICES_CAPITA is significant in a large proportion of Brazil. The general trend is that a unit increase in GVA_SERVICES_CAPITA causes an increase in GDP per Capita. The estimate is higher in the West, suggesting greater services.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "CAR_PR", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "CAR_PR_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The CAR_PR is significant mainly towards the South of Brazil. There are mixed trends whereby a unit increase in Fixed_Phones_PR causes an increase in GDP per Capita and in some places unit increase in Fixed_Phones_PR causes an decrease in GDP per Capita.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "FIXED_PHONES_PR", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "FIXED_PHONES_PR_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The Fixed_Phones_PR is significant towards the South and East of Brazil. There are mixed trends whereby a unit increase in Fixed_Phones_PR causes an increase in GDP per Capita and in some places unit increase in Fixed_Phones_PR causes an decrease in GDP per Capita.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "TAX_GDP_RATIO", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "TAX_GDP_RATIO_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The Tax to GDP Ratio is significant towards the South and a small part North of Brazil. The general trend is that a unit increase in Tax to GDP Ratio causes an increase in GDP per Capita.
v3 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "HDI_EDUCATION", border.alpha = 0.15, palette = "-RdBu")
v4 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "HDI_EDUCATION_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v3, v4, ncol = 2)The HDI Education is significant towards the South and Central Brazil. The general trend is that a unit increase in HDI Education causes an increase in GDP per Capita.
v9 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "Intercept", palette = "-RdBu", border.alpha = 0.15)
v10 <- tm_shape(brazil_sf_fixed)+
tm_polygons(col = "Intercept_T", border.alpha = 0.15, palette = "-RdBu", breaks = c(0, 1.96, Inf))
tmap_arrange(v9, v10, ncol = 2)The intercept is the expected mean value of GDP per Capita when all the independent variables equal to 0.
The Intercept is significant towards the South of Brazil. This means when all the variables are equal to 0, the GDP per Capita is negative.