In this take-home exercise, you are tasked to determine factors affecting the unequal development of Brazil at the municipality level by using the data provided. The specific task of the analysis are as follows:
Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.
Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method.
Prepare a choropleth map showing the distribution of the residual of the GDP per capita.
Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.
Prepare a series of choropleth maps showing the outputs of the geographically weighted regression model
The R packages needed for this exercise are as follows:
Geospatial statistical modelling package * GWmodel, heatmaply, spatstat Spatial data handling * sf, geobr Attribute data handling * tidyverse, readr, ggplot2 and dplyr Choropleth mapping * tmap Savling and loading Geospatial data * rgdal (for easier loading of data)
The code chunks below installs and launches these R packages into R environment.
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr','rgdal', 'heatmaply', "spatstat")
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
# Retrieves a quick breakdown of the number of NA rows and invalid polygons/points
Validity_NA_Check <- function(target_st) {
validity <- st_is_valid(target_st)
NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
Invalid_rows <- which(validity==FALSE)
print(paste("For:", deparse(substitute(target_st))))
print(paste("Number of Invalid polygons/points is:", length(Invalid_rows)))
print(paste("Number of NA rows is:", nrow((NA_rows))))
}
# Retrieves the exact polygon which is invalid
get_invalid <- function(target_st) {
validity <- st_is_valid(target_st)
Invalid_rows <- which(validity==FALSE)
return(Invalid_rows)
}
# Retrieves the exact rows which contain NA values for you to check the columns
get_NA_rows <- function(target_st) {
NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
return(NA_rows)
}
# A cleaning function that replaces NA with "Missing" so that calculations can still be done.
## This function is a little unnessary as we will not be using the data attached to the geospatial points.
replace_NA_with_zero <- function(x, column_name){
x$column_name[is.na(x$column_name)] <- 0
}
The condo_resale_2015 is in csv file format. The codes chunk below uses read_csv() function of readr package to import condo_resale_2015 into R as a tibble data frame called condo_resale.
Brazil_cities_raw = read_delim("data/aspatial/BRAZIL_CITIES.csv", ";")
Reference = read_delim("data/aspatial/Data_Dictionary.csv", ";")
summary(Brazil_cities_raw)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5573 Length:5573 Min. :0.000000 Min. : 805
## Class :character Class :character 1st Qu.:0.000000 1st Qu.: 5235
## Mode :character Mode :character Median :0.000000 Median : 10934
## Mean :0.004845 Mean : 34278
## 3rd Qu.:0.000000 3rd Qu.: 23424
## Max. :1.000000 Max. :11253503
## NA's :8
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.0 Min. : 239 Min. : 60
## 1st Qu.: 5230 1st Qu.: 0.0 1st Qu.: 1572 1st Qu.: 874
## Median : 10926 Median : 0.0 Median : 3174 Median : 1846
## Mean : 34200 Mean : 77.5 Mean : 10303 Mean : 8859
## 3rd Qu.: 23390 3rd Qu.: 10.0 3rd Qu.: 6726 3rd Qu.: 4624
## Max. :11133776 Max. :119727.0 Max. :3576148 Max. :3548433
## NA's :8 NA's :8 NA's :10 NA's :10
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 487 1st Qu.: 2801 1st Qu.: 38.0 1st Qu.: 158
## Median : 931 Median : 6170 Median : 92.0 Median : 376
## Mean : 1463 Mean : 27595 Mean : 383.3 Mean : 1544
## 3rd Qu.: 1832 3rd Qu.: 15302 3rd Qu.: 232.0 3rd Qu.: 951
## Max. :33809 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :81 NA's :8 NA's :8 NA's :8
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 259 1st Qu.: 1734 1st Qu.: 341
## Median : 516 Median : 588 Median : 3841 Median : 722
## Mean : 2069 Mean : 2381 Mean : 18212 Mean : 3004
## 3rd Qu.: 1300 3rd Qu.: 1478 3rd Qu.: 9628 3rd Qu.: 1724
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :8 NA's :8 NA's :8 NA's :8
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2326 1st Qu.:1392 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2783 Median :0.6650
## Mean : 14179.9 Mean : 57384 Mean :2783 Mean :0.6592
## 3rd Qu.: 11194.2 3rd Qu.: 55619 3rd Qu.:4174 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :3 NA's :3 NA's :8 NA's :8
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.40
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :8 NA's :8 NA's :8 NA's :9
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1 Min. : 3
## 1st Qu.:-22.838 1st Qu.: 169.8 1st Qu.: 88 1st Qu.: 119
## Median :-18.089 Median : 406.5 Median : 247 Median : 327
## Mean :-16.444 Mean : 893.8 Mean : 3094 Mean : 6567
## 3rd Qu.: -8.489 3rd Qu.: 628.9 3rd Qu.: 815 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668 Max. :5543127
## NA's :9 NA's :9 NA's :3 NA's :3
## AREA REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## Min. : 3.57 Length:5573 Length:5573 Min. : 786
## 1st Qu.: 204.44 Class :character Class :character 1st Qu.: 5454
## Median : 416.59 Mode :character Mode :character Median : 11590
## Mean : 1517.44 Mean : 37432
## 3rd Qu.: 1026.57 3rd Qu.: 25296
## Max. :159533.33 Max. :12176866
## NA's :3 NA's :3
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## Length:5573 Min. : 0 Min. : 1 Min. : 2
## Class :character 1st Qu.: 4189 1st Qu.: 1726 1st Qu.: 10112
## Mode :character Median : 20426 Median : 7424 Median : 31211
## Mean : 47271 Mean : 175928 Mean : 489451
## 3rd Qu.: 51227 3rd Qu.: 41022 3rd Qu.: 115406
## Max. :1402282 Max. :63306755 Max. :464656988
## NA's :3 NA's :3 NA's :3
## GVA_PUBLIC GVA_TOTAL TAXES GDP
## Min. : 7 Min. : 17 Min. : -14159 Min. : 15
## 1st Qu.: 17267 1st Qu.: 42253 1st Qu.: 1305 1st Qu.: 43709
## Median : 35866 Median : 119492 Median : 5100 Median : 125153
## Mean : 123768 Mean : 832987 Mean : 118864 Mean : 954584
## 3rd Qu.: 89245 3rd Qu.: 313963 3rd Qu.: 22197 3rd Qu.: 329539
## Max. :41902893 Max. :569910503 Max. :117125387 Max. :687035890
## NA's :3 NA's :3 NA's :3 NA's :3
## POP_GDP GDP_CAPITA GVA_MAIN MUN_EXPENDIT
## Min. : 815 Min. : 3191 Length:5573 Min. :1.421e+06
## 1st Qu.: 5483 1st Qu.: 9058 Class :character 1st Qu.:1.573e+07
## Median : 11578 Median : 15870 Mode :character Median :2.746e+07
## Mean : 36998 Mean : 21126 Mean :1.043e+08
## 3rd Qu.: 25085 3rd Qu.: 26155 3rd Qu.:5.666e+07
## Max. :12038175 Max. :314638 Max. :4.577e+10
## NA's :3 NA's :3 NA's :1492
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 162.0 Median : 2.00 Median : 0.000 Median : 11.00
## Mean : 906.8 Mean : 18.25 Mean : 1.852 Mean : 73.44
## 3rd Qu.: 448.0 3rd Qu.: 8.00 3rd Qu.: 2.000 3rd Qu.: 39.00
## Max. :530446.0 Max. :1948.00 Max. :274.000 Max. :31566.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 1.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 32.0
## Median : 0.0000 Median : 0.000 Median : 4.00 Median : 74.5
## Mean : 0.4262 Mean : 2.029 Mean : 43.26 Mean : 348.0
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00 3rd Qu.: 199.0
## Max. :332.0000 Max. :657.000 Max. :25222.00 Max. :150633.0
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41 Mean : 55.88 Mean : 24.74 Mean : 15.55
## 3rd Qu.: 25 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515 Max. :29290.00 Max. :38720.00 Max. :23738.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 0.00 Median : 4.00 Median : 4.0 Median : 2.000
## Mean : 15.14 Mean : 51.29 Mean : 83.7 Mean : 3.269
## 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.0 3rd Qu.: 3.000
## Max. :14003.00 Max. :49181.00 Max. :76757.0 Max. :204.000
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_P COMP_Q COMP_R COMP_S
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.: 5.00
## Median : 6.00 Median : 3.00 Median : 2.00 Median : 12.00
## Mean : 30.96 Mean : 34.15 Mean : 12.18 Mean : 51.61
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00 3rd Qu.: 31.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00 Max. :24832.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_T COMP_U HOTELS BEDS
## Min. :0 Min. : 0.00000 Min. : 1.000 Min. : 2.0
## 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000 1st Qu.: 40.0
## Median :0 Median : 0.00000 Median : 1.000 Median : 82.0
## Mean :0 Mean : 0.05027 Mean : 3.131 Mean : 257.5
## 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000 3rd Qu.: 200.0
## Max. :0 Max. :123.00000 Max. :97.000 Max. :13247.0
## NA's :3 NA's :3 NA's :4686 NA's :4686
## Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.00
## 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000 1st Qu.:1.00
## Median : 1.000 Median : 2.000 Median : 1.000 Median :2.00
## Mean : 3.383 Mean : 2.829 Mean : 1.312 Mean :1.58
## 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.:2.00
## Max. :1693.000 Max. :626.000 Max. :83.000 Max. :8.00
## NA's :2231 NA's :2231 NA's :2231 NA's :2231
## Pr_Assets Pu_Assets Cars Motorcycles
## Min. :0.000e+00 Min. :0.000e+00 Min. : 2 Min. : 4
## 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602 1st Qu.: 591
## Median :3.231e+07 Median :1.339e+08 Median : 1438 Median : 1285
## Mean :9.180e+09 Mean :6.005e+09 Mean : 9859 Mean : 4879
## 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4086 3rd Qu.: 3294
## Max. :1.947e+13 Max. :8.016e+12 Max. :5740995 Max. :1134570
## NA's :2231 NA's :2231 NA's :11 NA's :11
## Wheeled_tractor UBER MAC WAL-MART
## Min. : 0.000 Min. :1 Min. : 1.000 Min. : 1.000
## 1st Qu.: 0.000 1st Qu.:1 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 0.000 Median :1 Median : 2.000 Median : 1.000
## Mean : 5.754 Mean :1 Mean : 4.277 Mean : 2.059
## 3rd Qu.: 1.000 3rd Qu.:1 3rd Qu.: 3.000 3rd Qu.: 1.750
## Max. :3236.000 Max. :1 Max. :130.000 Max. :26.000
## NA's :11 NA's :5448 NA's :5407 NA's :5471
## POST_OFFICES
## Min. : 1.000
## 1st Qu.: 1.000
## Median : 1.000
## Mean : 2.081
## 3rd Qu.: 2.000
## Max. :225.000
## NA's :120
Extensive data cleaning is also required to ensure the data would be useful and regressions can be formulated.
Unfortunately it seems that there are a lot of rows with missing values. In fact almost all of them are missing some values. We will begin to clean the dataset as best we can in order to formulate our desired indicators to test variables which affect GDP per capita growth.
which(duplicated(Brazil_cities_raw[,1]))
## [1] 48 50 51 91 142 143 159 179 207 226 261 270 318 352 370
## [16] 418 434 484 497 508 517 539 551 563 582 583 591 634 635 644
## [31] 657 670 671 676 677 678 679 693 703 704 709 715 716 717 730
## [46] 766 813 851 856 857 877 885 939 957 973 1007 1009 1015 1041 1042
## [61] 1049 1058 1089 1102 1162 1184 1210 1212 1217 1306 1317 1351 1353 1485 1486
## [76] 1535 1620 1646 1673 1699 1723 1748 1762 1790 1805 1827 1901 1982 2004 2006
## [91] 2062 2072 2163 2189 2195 2198 2253 2258 2273 2285 2327 2343 2344 2375 2381
## [106] 2393 2465 2489 2514 2531 2539 2547 2557 2640 2652 2661 2662 2702 2707 2713
## [121] 2724 2744 2935 2992 3053 3062 3082 3135 3151 3182 3213 3216 3217 3245 3251
## [136] 3298 3324 3354 3356 3357 3378 3387 3390 3405 3406 3422 3483 3484 3490 3502
## [151] 3521 3533 3536 3552 3580 3625 3635 3659 3670 3693 3702 3764 3785 3789 3811
## [166] 3813 3845 3868 3880 3881 3882 4003 4008 4015 4019 4025 4027 4031 4040 4073
## [181] 4092 4116 4141 4148 4152 4158 4195 4201 4232 4296 4312 4324 4351 4363 4369
## [196] 4370 4397 4401 4402 4403 4407 4408 4409 4411 4419 4422 4423 4424 4433 4454
## [211] 4473 4482 4488 4489 4490 4499 4538 4590 4611 4617 4618 4619 4620 4643 4644
## [226] 4645 4651 4663 4674 4686 4688 4724 4776 4829 4862 4891 4912 4917 4924 4937
## [241] 4941 5027 5038 5074 5077 5085 5115 5145 5156 5159 5162 5164 5191 5207 5222
## [256] 5226 5258 5302 5305 5306 5340 5346 5425 5435 5439 5450 5457 5471 5472 5473
## [271] 5491 5498 5499 5556
which(duplicated(Brazil_cities_raw[,1:2]))
## integer(0)
With respect to the data, there appears to be a large number of city names repeated. This could cause problems in further joining operations. We will need to create unique identifers by combining them with the STATE column in order to perform any sort of joining.
Brazil_cities_uniques <- cbind(CITY_STATE = paste(Brazil_cities_raw$CITY, Brazil_cities_raw$STATE, sep="_"), Brazil_cities_raw)
which(duplicated(Brazil_cities_uniques[,1]))
## integer(0)
For the purpose of our analysis, since we’re looking at contributive factors that might lead to the differences in GDP per captial, with reference to the Data_Dictionary, we will be removing variables which come after 2016
NOTE: This is important because we would be making a logical fallacy if we try to build explainatory models on factors which happen post-event which may draw reverse causation. This would not affect things such as Area as those would stay constant regardless of time differences. Additionally, we will still have enough variables and derived variables to perform our analysis.
We will also be removing MUN_EXPENDITURE because of the large amounts of missing data points and our inability to properly estimate these values from external sources. Because this specific column has much larger amounts of missing rows, it would be ill-advised to remove rows rather than the entire column itself.
Lastly we will also remove COMP_T as there is no data values there at all
drops <- c("IBGE_PLANTED_AREA","IBGE_CROP_PRODUCTION_$", "PAY_TV", "FIXED_PHONES", "ESTIMATED_POP", "REGIAO_TUR", "CATEGORIA_TUR", "HOTELS", "BEDS", "Pr_Agencies", "Pu_Agencies", "Pr_Bank", "Pu_Bank", "Pr_Assets", "Pu_Assets", "Cars", "Motorcycles", "Wheeled_tractor", "UBER", "MAC", "WAL-MART", "POST_OFFICES", "MUN_EXPENDIT", "COMP_T")
Brazil_cities_2016 <- Brazil_cities_uniques[ , !(names(Brazil_cities_uniques) %in% drops)]
If the dependant variable is missing in our data, that specific city will unforunately not be able utilized in our analysis.
Missing_GDP_PC <- Brazil_cities_2016[(is.na(Brazil_cities_2016$GDP_CAPITA))!=0,]
Missing_GDP_PC
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP
## 2702 Lagoa Dos Patos_RS Lagoa Dos Patos RS 0 NA
## 4482 Santa Teresinha_BA Santa Teresinha BA 0 NA
## 4606 São Caetano_PE São Caetano PE 0 NA
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 2702 NA NA NA NA NA
## 4482 NA NA NA NA NA
## 4606 NA NA NA NA NA
## IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 2702 NA NA NA NA NA NA NA
## 4482 NA NA NA NA NA NA NA
## 4606 NA NA NA NA NA NA NA
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## 2702 NA NA NA NA NA NA
## 4482 4493 0.59 0.549 0.804 0.459 -39.52114
## 4606 NA NA NA NA NA NA
## LAT ALT AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 2702 NA NA 10158.75 <NA> NA NA
## 4482 -12.77285 222.51 NA <NA> NA NA
## 4606 NA NA NA <NA> NA NA
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA GVA_MAIN
## 2702 NA NA NA NA NA NA NA <NA>
## 4482 NA NA NA NA NA NA NA <NA>
## 4606 NA NA NA NA NA NA NA <NA>
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2702 NA NA NA NA NA NA NA NA NA NA
## 4482 NA NA NA NA NA NA NA NA NA NA
## 4606 NA NA NA NA NA NA NA NA NA NA
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2702 NA NA NA NA NA NA NA NA NA NA
## 4482 NA NA NA NA NA NA NA NA NA NA
## 4606 NA NA NA NA NA NA NA NA NA NA
## COMP_U
## 2702 NA
## 4482 NA
## 4606 NA
According to Wikipedia, the number of municipalities in Brazil should amount to 5,573. However, our dataset includes 5,573. Which means that the 3 cities with missing GDPC are probably not accoutned for in some way. We will then remove the observed cities assuming they are irrelevant to our study. Source: https://en.wikipedia.org/wiki/Municipalities_of_Brazil
Brazil_cities_allGDPC <- Brazil_cities_2016[(is.na(Brazil_cities_2016$GDP_CAPITA))==0,]
summary((Brazil_cities_allGDPC))
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5570 Length:5570
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5564
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median :0.000000 Median : 10934 Median : 10926 Median : 0.0
## Mean :0.004847 Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.:0.000000 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.0
## NA's :5 NA's :5 NA's :5
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 3 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 487 1st Qu.: 2801
## Median : 3174 Median : 1846 Median : 931 Median : 6170
## Mean : 10303 Mean : 8859 Mean : 1463 Mean : 27595
## 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1832 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
## NA's :7 NA's :7 NA's :78 NA's :5
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
## NA's :5 NA's :5 NA's :5 NA's :5
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29 Min. : 1 Min. :0.4180
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.:1392 1st Qu.:0.5990
## Median : 3841 Median : 722 Median :2782 Median :0.6650
## Mean : 18212 Mean : 3004 Mean :2783 Mean :0.6592
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012 Max. :5565 Max. :0.8620
## NA's :5 NA's :5 NA's :6 NA's :6
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :6 NA's :6 NA's :6 NA's :7
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5570
## 1st Qu.:-22.838 1st Qu.: 169.7 1st Qu.: 204.43 Class :character
## Median :-18.090 Median : 406.5 Median : 415.92 Mode :character
## Mean :-16.445 Mean : 894.0 Mean : 1515.89
## 3rd Qu.: -8.489 3rd Qu.: 629.0 3rd Qu.: 1026.38
## Max. : 4.585 Max. :874579.0 Max. :159533.33
## NA's :7 NA's :7 NA's :1
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4189 1st Qu.: 1726 1st Qu.: 10112 1st Qu.: 17267
## Median : 20426 Median : 7424 Median : 31211 Median : 35866
## Mean : 47271 Mean : 175928 Mean : 489451 Mean : 123768
## 3rd Qu.: 51227 3rd Qu.: 41022 3rd Qu.: 115406 3rd Qu.: 89245
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42253 1st Qu.: 1305 1st Qu.: 43709 1st Qu.: 5483
## Median : 119492 Median : 5100 Median : 125153 Median : 11578
## Mean : 832987 Mean : 118864 Mean : 954584 Mean : 36998
## 3rd Qu.: 313963 3rd Qu.: 22197 3rd Qu.: 329539 3rd Qu.: 25085
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5570 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9058 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15870 Mode :character Median : 162.0 Median : 2.00
## Mean : 21126 Mean : 906.8 Mean : 18.25
## 3rd Qu.: 26155 3rd Qu.: 448.0 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.00 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.000 Median : 11.00 Median : 0.0000 Median : 0.000
## Mean : 1.852 Mean : 73.44 Mean : 0.4262 Mean : 2.029
## 3rd Qu.: 2.000 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :274.000 Max. :31566.00 Max. :332.0000 Max. :657.000
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1 1st Qu.: 2.00
## Median : 4.00 Median : 74.5 Median : 7 Median : 7.00
## Mean : 43.26 Mean : 348.0 Mean : 41 Mean : 55.88
## 3rd Qu.: 15.00 3rd Qu.: 199.0 3rd Qu.: 25 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.74 Mean : 15.55 Mean : 15.14 Mean : 51.29
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.0 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.0 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.7 Mean : 3.269 Mean : 30.96 Mean : 34.15
## 3rd Qu.: 14.0 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.0 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.18 Mean : 51.61 Mean : 0.05027
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
Brazil_cities_allGDPC[(is.na(Brazil_cities_allGDPC$IBGE_RES_POP_ESTR))!=0,]
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP
## 472 Balneário Rincão_SC Balneário Rincão SC 0 NA
## 3117 Mojuí Dos Campos_PA Mojuí Dos Campos PA 0 NA
## 3581 Paraíso Das Águas_MS Paraíso Das Águas MS 0 NA
## 3761 Pescaria Brava_SC Pescaria Brava SC 0 NA
## 3821 Pinto Bandeira_RS Pinto Bandeira RS 0 NA
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 472 NA NA NA NA NA
## 3117 NA NA NA NA NA
## 3581 NA NA NA NA NA
## 3761 NA NA NA NA NA
## 3821 NA NA NA NA NA
## IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 472 NA NA NA NA NA NA NA
## 3117 NA NA NA NA NA NA NA
## 3581 NA NA NA NA NA NA NA
## 3761 NA NA NA NA NA NA NA
## 3821 NA NA NA NA NA NA NA
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT
## 472 NA NA NA NA NA NA NA
## 3117 NA NA NA NA NA NA NA
## 3581 NA NA NA NA NA NA NA
## 3761 NA NA NA NA NA NA NA
## 3821 NA NA NA NA NA NA NA
## ALT AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 472 NA 63.43 Sem classificação 2045.03 51257.53 96248.50
## 3117 NA 4988.24 Sem classificação 42123.35 7.20 28168.56
## 3581 NA 5061.43 Sem classificação 210844.60 146514.00 68393.39
## 3761 NA 106.85 Sem classificação 3167.11 5812.35 29.46
## 3821 NA 104.86 Sem classificação 19067.89 4366.36 9652.04
## GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 472 52820.64 202371.69 14863.05 217234.75 12212 17788.63
## 3117 55645.41 133135.10 4177.94 137313.05 15548 8831.56
## 3581 36606.37 462358.36 21594.41 483952.77 5251 92163.92
## 3761 39700.00 78.14 4505.77 82645.86 9908 8341.33
## 3821 14620.12 47.71 4064.74 51771.14 2847 18184.45
## GVA_MAIN
## 472 Demais serviços
## 3117 Administração, defesa, educação e saúde públicas e seguridade social
## 3581 Agricultura, inclusive apoio à agricultura e a pós colheita
## 3761 Administração, defesa, educação e saúde públicas e seguridade social
## 3821 Agricultura, inclusive apoio à agricultura e a pós colheita
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 472 270 1 1 16 0 2 47 112 8 13
## 3117 78 0 0 3 0 0 2 14 6 0
## 3581 129 5 1 0 1 2 9 57 21 7
## 3761 105 1 1 22 0 2 6 36 7 3
## 3821 63 1 0 12 0 0 4 18 7 5
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 472 3 6 11 10 23 2 3 6 1 5
## 3117 0 0 0 2 2 0 41 2 0 6
## 3581 1 0 0 4 9 2 3 2 0 5
## 3761 1 0 1 1 1 2 14 0 1 6
## 3821 0 0 2 2 2 1 1 1 3 4
## COMP_U
## 472 0
## 3117 0
## 3581 0
## 3761 0
## 3821 0
Due to the large amount of missing data from these cities, we will be removing them as we would be unable to properly estimate the population at these specific dates unless the data is provided to us. Additionally, as they are only 5 cities, we can still utilize the remaining 5565 for the purposes of our analysis which is more than sufficient.
Brazil_cities_allpop <- Brazil_cities_allGDPC[(is.na(Brazil_cities_allGDPC$IBGE_RES_POP_ESTR))==0,]
summary(Brazil_cities_allpop)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5565 Length:5565
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5559
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median :0.000000 Median : 10934 Median : 10926 Median : 0.0
## Mean :0.004852 Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.:0.000000 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.0
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 3 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 487 1st Qu.: 2801
## Median : 3174 Median : 1846 Median : 931 Median : 6170
## Mean : 10303 Mean : 8859 Mean : 1463 Mean : 27595
## 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1832 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
## NA's :2 NA's :2 NA's :73
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29 Min. : 1 Min. :0.4180
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.:1392 1st Qu.:0.5990
## Median : 3841 Median : 722 Median :2782 Median :0.6650
## Mean : 18212 Mean : 3004 Mean :2783 Mean :0.6592
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012 Max. :5565 Max. :0.8620
## NA's :1 NA's :1
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :1 NA's :1 NA's :1 NA's :2
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5565
## 1st Qu.:-22.838 1st Qu.: 169.7 1st Qu.: 204.53 Class :character
## Median :-18.090 Median : 406.5 Median : 416.59 Mode :character
## Mean :-16.445 Mean : 894.0 Mean : 1515.39
## 3rd Qu.: -8.489 3rd Qu.: 629.0 3rd Qu.: 1025.73
## Max. : 4.585 Max. :874579.0 Max. :159533.33
## NA's :2 NA's :2 NA's :1
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4193 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17260
## Median : 20430 Median : 7425 Median : 31212 Median : 35809
## Mean : 47263 Mean : 176049 Mean : 489855 Mean : 123844
## 3rd Qu.: 51238 3rd Qu.: 41011 3rd Qu.: 115521 3rd Qu.: 89316
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42254 1st Qu.: 1303 1st Qu.: 43706 1st Qu.: 5488
## Median : 119481 Median : 5107 Median : 125111 Median : 11584
## Mean : 833592 Mean : 118962 Mean : 955266 Mean : 37023
## 3rd Qu.: 313988 3rd Qu.: 22209 3rd Qu.: 329717 3rd Qu.: 25102
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5565 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9062 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15866 Mode :character Median : 162.0 Median : 2.00
## Mean : 21119 Mean : 907.5 Mean : 18.27
## 3rd Qu.: 26155 3rd Qu.: 449.0 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.0 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 3.0 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.000 Median : 11.0 Median : 0.0000 Median : 0.00
## Mean : 1.853 Mean : 73.5 Mean : 0.4264 Mean : 2.03
## 3rd Qu.: 2.000 3rd Qu.: 39.0 3rd Qu.: 0.0000 3rd Qu.: 1.00
## Max. :274.000 Max. :31566.0 Max. :332.0000 Max. :657.00
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.00 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.29 Mean : 348.2 Mean : 41.02 Mean : 55.92
## 3rd Qu.: 15.00 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515.00 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.76 Mean : 15.56 Mean : 15.15 Mean : 51.34
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.00 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.00 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.77 Mean : 3.271 Mean : 30.98 Mean : 34.18
## 3rd Qu.: 14.00 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.00 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.19 Mean : 51.65 Mean : 0.05031
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IBGE_DU_RURAL))!=0,]
## CITY_STATE CITY STATE CAPITAL
## 70 Águas De São Pedro_SP Águas De São Pedro SP 0
## 178 Alvorada_RS Alvorada RS 0
## 295 Aracaju_SE Aracaju SE 1
## 296 Araçariguama_SP Araçariguama SP 0
## 385 Armação Dos Búzios_RJ Armação Dos Búzios RJ 0
## 392 Arraial Do Cabo_RJ Arraial Do Cabo RJ 0
## 455 Baía Da Traição_PB Baía Da Traição PB 0
## 468 Balneário Camboriú_SC Balneário Camboriú SC 0
## 559 Barueri_SP Barueri SP 0
## 588 Belford Roxo_RJ Belford Roxo RJ 0
## 593 Belo Horizonte_MG Belo Horizonte MG 1
## 707 Bombinhas_SC Bombinhas SC 0
## 826 Cabedelo_PB Cabedelo PB 0
## 855 Cachoeirinha_RS Cachoeirinha RS 0
## 923 Camaragibe_PE Camaragibe PE 0
## 978 Campo Limpo Paulista_SP Campo Limpo Paulista SP 0
## 1036 Canoas_RS Canoas RS 0
## 1098 Carapicuíba_SP Carapicuíba SP 0
## 1357 Confins_MG Confins MG 0
## 1437 Cotia_SP Cotia SP 0
## 1490 Cubatão_SP Cubatão SP 0
## 1509 Curitiba_PR Curitiba PR 1
## 1552 Diadema_SP Diadema SP 0
## 1657 Embu Das Artes_SP Embu Das Artes SP 0
## 1731 Eusébio_CE Eusébio CE 0
## 1774 Fernando De Noronha_PE Fernando De Noronha PE 0
## 1832 Fortaleza_CE Fortaleza CE 1
## 2037 Guarulhos_SP Guarulhos SP 0
## 2068 Hortolândia_SP Hortolândia SP 0
## 2155 Iguaba Grande_RJ Iguaba Grande RJ 0
## 2166 Ilha Comprida_SP Ilha Comprida SP 0
## 2181 Imbituba_SC Imbituba SC 0
## 2360 Itaparica_BA Itaparica BA 0
## 2376 Itapevi_SP Itapevi SP 0
## 2401 Itaquaquecetuba_SP Itaquaquecetuba SP 0
## 2515 Jandira_SP Jandira SP 0
## 2524 Japeri_RJ Japeri RJ 0
## 2587 Joanópolis_SP Joanópolis SP 0
## 2752 Lauro De Freitas_BA Lauro De Freitas BA 0
## 2783 Lindóia_SP Lindóia SP 0
## 2937 Marcação_PB Marcação PB 0
## 3028 Mauá_SP Mauá SP 0
## 3052 Mesquita_RJ Mesquita RJ 0
## 3241 Natal_RN Natal RN 1
## 3267 Nilópolis_RJ Nilópolis RJ 0
## 3274 Niterói_RJ Niterói RJ 0
## 3470 Osasco_SP Osasco SP 0
## 3500 Pacaraima_RR Pacaraima RR 0
## 3624 Parnamirim_RN Parnamirim RN 0
## 3669 Paulista_PE Paulista PE 0
## 3804 Pinhais_PR Pinhais PR 0
## 3828 Piracaia_SP Piracaia SP 0
## 3850 Pirapora Do Bom Jesus_SP Pirapora Do Bom Jesus SP 0
## 3949 Porto Alegre_RS Porto Alegre RS 1
## 4002 Praia Grande_SP Praia Grande SP 0
## 4074 Queimados_RJ Queimados RJ 0
## 4112 Recife_PE Recife PE 1
## 4179 Ribeirão Pires_SP Ribeirão Pires SP 0
## 4209 Rio De Janeiro_RJ Rio De Janeiro RJ 1
## 4225 Rio Grande Da Serra_SP Rio Grande Da Serra SP 0
## 4378 Santa Cruz De Minas_MG Santa Cruz De Minas MG 0
## 4505 Santana De Parnaíba_SP Santana De Parnaíba SP 0
## 4537 Santo André_SP Santo André SP 0
## 4608 São Caetano Do Sul_SP São Caetano Do Sul SP 0
## 4707 São João De Meriti_RJ São João De Meriti RJ 0
## 4807 São Lourenço_MG São Lourenço MG 0
## 5117 Taboão Da Serra_SP Taboão Da Serra SP 0
## 5367 Uiramutã_RR Uiramutã RR 0
## 5431 Valparaíso De Goiás_GO Valparaíso De Goiás GO 0
## 5443 Vargem Grande Paulista_SP Vargem Grande Paulista SP 0
## 5459 Várzea Paulista_SP Várzea Paulista SP 0
## 5486 Vespasiano_MG Vespasiano MG 0
## 5537 Vitória_ES Vitória ES 1
## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## 70 2707 2693 14 990 990
## 178 195673 195483 190 60221 60221
## 295 571149 570674 475 169830 169830
## 296 17080 16964 116 4940 4940
## 385 27560 27073 487 9030 9030
## 392 27715 27655 60 8940 8940
## 455 8012 8005 7 875 875
## 468 108089 107010 1079 39333 39333
## 559 240749 239837 912 71821 71821
## 588 469332 468931 401 145726 145726
## 593 2375151 2369063 6088 762924 762924
## 707 14293 14140 153 4627 4627
## 826 57944 57913 31 17180 17180
## 855 118278 118170 108 38888 38888
## 923 144466 144374 92 42291 42291
## 978 74074 73873 201 22030 22030
## 1036 323827 323280 547 103963 103963
## 1098 369584 368853 731 108676 108676
## 1357 5936 5930 6 1696 1696
## 1437 201150 199966 1184 59017 59017
## 1490 118720 118537 183 36417 36417
## 1509 1751907 1743036 8871 576347 576347
## 1552 386089 385274 815 117397 117397
## 1657 240230 239773 457 68210 68210
## 1731 46033 45936 97 12713 12713
## 1774 2630 2630 0 590 590
## 1832 2452185 2449109 3076 711478 711478
## 2037 1221979 1216222 5757 360748 360748
## 2068 192692 192329 363 55430 55430
## 2155 22851 22729 122 7582 7582
## 2166 9025 8979 46 3125 3125
## 2181 40170 40046 124 13186 13186
## 2360 20725 20685 40 6364 6364
## 2376 200769 200557 212 57634 57634
## 2401 321770 321071 699 89751 89751
## 2515 108344 108086 258 32545 32545
## 2524 95492 95424 68 28424 28424
## 2587 11768 11715 53 3895 3895
## 2752 163449 162510 939 49533 49533
## 2783 6712 6699 13 2221 2221
## 2937 7609 7609 0 NA NA
## 3028 417064 416393 671 125423 125423
## 3052 168376 168020 356 53117 53117
## 3241 803739 802686 1053 235720 235720
## 3267 157425 156947 478 50521 50521
## 3274 487562 483821 3741 169306 169306
## 3470 666740 664447 2293 202009 202009
## 3500 10433 10381 52 354 354
## 3624 202456 202099 357 60388 60388
## 3669 300466 300293 173 90683 90683
## 3804 117008 116814 194 35572 35572
## 3828 25116 25045 71 7825 7825
## 3850 15733 15716 17 4389 4389
## 3949 1409351 1403450 5901 508503 508503
## 4002 262051 260454 1597 83597 83597
## 4074 137962 137864 98 42248 42248
## 4112 1537704 1535289 2415 471252 471252
## 4179 113068 112613 455 33819 33819
## 4209 6320446 6264915 55531 2147235 2147235
## 4225 43974 43936 38 13207 13207
## 4378 7865 7862 3 2520 2520
## 4505 108813 107879 934 31630 31630
## 4537 676407 672359 4048 216343 216343
## 4608 149263 147306 1957 50518 50518
## 4707 458673 457807 866 147516 147516
## 4807 41657 41405 252 13662 13662
## 5117 244528 243903 625 72337 72337
## 5367 8375 8375 0 NA NA
## 5431 132982 132857 125 39440 39440
## 5443 42997 42795 202 12545 12545
## 5459 107089 107047 42 31607 31607
## 5486 104527 104479 48 29820 29820
## 5537 327801 326735 1066 108502 108502
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59
## 70 NA 2687 19 95 135 180 1592
## 178 NA 194483 2857 11832 16525 19930 126085
## 295 NA 566369 7978 32261 39994 46322 388256
## 296 NA 11232 201 808 970 1067 7458
## 385 NA 27400 414 1507 2090 2510 18593
## 392 NA 24002 313 1241 1734 2064 15468
## 455 NA 3076 37 187 272 312 1918
## 468 NA 90291 974 3692 4664 5793 63678
## 559 NA 235508 3621 13818 18217 21665 161351
## 588 NA 468215 6431 26573 37438 46131 307093
## 593 NA 2263631 25105 99493 135491 160227 1553724
## 707 NA 5138 73 279 337 402 3549
## 826 NA 55820 934 3428 4449 4775 37322
## 855 NA 113673 1388 5718 7919 9584 76735
## 923 NA 130739 1711 7083 9997 11106 88037
## 978 NA 72014 1044 4131 5264 6142 48767
## 1036 NA 317311 4124 17051 22966 26445 210363
## 1098 NA 346271 5075 20476 26844 30076 235128
## 1357 NA 5396 61 258 384 481 3668
## 1437 NA 174780 2631 10229 13711 15747 118610
## 1490 NA 69434 801 3460 4658 5294 48044
## 1509 NA 1688975 21318 81910 107086 124078 1162399
## 1552 NA 349014 4962 19917 26172 29822 240126
## 1657 NA 226818 3480 13879 18333 21501 153634
## 1731 NA 41441 716 2709 3619 4451 26865
## 1774 NA 2147 41 137 154 139 1599
## 1832 NA 2366137 32011 129766 169050 199714 1604565
## 2037 NA 1036178 14365 58730 78414 90184 701841
## 2068 NA 187164 2665 10742 14380 17134 127766
## 2155 NA 22836 240 1067 1543 1967 14311
## 2166 NA 5177 55 259 391 462 3151
## 2181 NA 38550 410 1853 2583 3149 25452
## 2360 NA 19132 265 1139 1616 1925 12289
## 2376 NA 196067 3118 13035 17401 20070 129795
## 2401 NA 314720 5082 20794 28493 32792 208097
## 2515 NA 107719 1693 6560 8681 10081 73926
## 2524 NA 88906 1261 5580 7664 9076 57539
## 2587 NA 7995 85 430 518 553 5285
## 2752 NA 151542 2199 8813 11662 13234 104619
## 2783 NA 5437 71 310 342 428 3572
## 2937 NA 2838 45 211 277 266 1701
## 3028 NA 376982 5117 20512 26902 32168 259445
## 3052 NA 167162 2100 8849 12068 14734 109979
## 3241 NA 790062 10364 41558 54619 64785 536124
## 3267 NA 155579 1755 7410 10445 12252 102966
## 3274 NA 409668 3643 14642 19875 23912 271260
## 3470 NA 616068 8089 32305 42733 49379 420590
## 3500 NA 4481 87 400 506 567 2714
## 3624 NA 201036 3041 11801 15437 17684 138581
## 3669 NA 250978 3236 12964 17685 20225 170356
## 3804 NA 115412 1753 6613 9072 10239 78169
## 3828 NA 20157 228 963 1310 1728 13233
## 3850 NA 2785 37 124 178 256 1802
## 3949 NA 1339712 15235 58369 79310 93989 889503
## 4002 NA 249407 3437 14139 18886 21424 159645
## 4074 NA 133313 1866 7824 10858 13204 87112
## 4112 NA 1157593 13606 54720 73132 84879 782716
## 4179 NA 108060 1257 5224 7114 8531 73953
## 4209 NA 5426838 58958 235380 321084 382267 3559037
## 4225 NA 43776 598 2700 3586 4132 29340
## 4378 NA 7861 96 398 612 678 5307
## 4505 NA 76030 1020 4253 5916 6877 51739
## 4537 NA 645047 7233 29315 38363 44476 436194
## 4608 NA 148474 1336 5477 7290 8596 97726
## 4707 NA 446505 5673 23189 32842 39515 294332
## 4807 NA 40784 486 1939 2660 3228 26576
## 5117 NA 241855 3681 14077 18533 21480 164970
## 5367 NA 794 19 83 129 110 424
## 5431 NA 129290 2232 9321 11738 12659 87096
## 5443 NA 42806 646 2566 3393 3919 28596
## 5459 NA 103400 1546 5931 7728 8767 71395
## 5486 NA 84080 1252 4932 6824 7988 56595
## 5537 NA 299922 3494 13665 17384 20693 207460
## IBGE_60+ IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao
## 70 666 2 0.850 0.849 0.890 0.825
## 178 17254 1952 0.700 0.694 0.874 0.564
## 295 51558 230 0.770 0.784 0.823 0.708
## 296 728 1784 0.700 0.717 0.814 0.597
## 385 2286 1086 0.728 0.750 0.824 0.624
## 392 3182 940 0.733 0.722 0.805 0.677
## 455 350 4614 0.581 0.541 0.731 0.495
## 468 11490 4 0.845 0.854 0.894 0.789
## 559 16836 91 0.790 0.791 0.866 0.708
## 588 44549 2358 0.684 0.662 0.808 0.598
## 593 289591 20 0.810 0.841 0.856 0.737
## 707 498 122 0.780 0.753 0.864 0.732
## 826 4912 595 0.748 0.782 0.822 0.651
## 855 12329 435 0.757 0.749 0.857 0.675
## 923 12805 2152 0.690 0.656 0.805 0.628
## 978 6666 243 0.769 0.733 0.840 0.739
## 1036 36362 552 0.750 0.768 0.864 0.636
## 1098 28672 578 0.750 0.721 0.842 0.693
## 1357 544 613 0.747 0.706 0.830 0.711
## 1437 13852 138 0.780 0.789 0.851 0.707
## 1490 7177 856 0.737 0.716 0.821 0.681
## 1509 192184 10 0.823 0.850 0.855 0.768
## 1552 28015 425 0.757 0.717 0.844 0.716
## 1657 15991 915 0.735 0.700 0.839 0.676
## 1731 3081 1877 0.701 0.700 0.794 0.621
## 1774 77 83 0.790 0.781 0.839 0.748
## 1832 231031 467 0.754 0.749 0.824 0.695
## 2037 92644 334 0.763 0.746 0.831 0.717
## 2068 14477 447 0.760 0.716 0.859 0.703
## 2155 3708 363 0.760 0.744 0.841 0.704
## 2166 859 1171 0.725 0.696 0.823 0.666
## 2181 5103 293 0.765 0.734 0.868 0.703
## 2360 1898 2670 0.670 0.657 0.826 0.553
## 2376 12648 900 0.735 0.687 0.855 0.677
## 2401 19462 1510 0.710 0.665 0.844 0.648
## 2515 6778 375 0.760 0.738 0.841 0.706
## 2524 7786 2935 0.659 0.637 0.809 0.555
## 2587 1124 1967 0.700 0.707 0.824 0.585
## 2752 11015 482 0.754 0.781 0.827 0.663
## 2783 714 730 0.742 0.722 0.864 0.654
## 2937 338 5404 0.529 0.525 0.691 0.408
## 3028 32838 276 0.770 0.721 0.852 0.733
## 3052 19432 850 0.737 0.704 0.839 0.678
## 3241 82612 325 0.763 0.768 0.835 0.694
## 3267 20751 498 0.753 0.731 0.817 0.716
## 3274 76336 7 0.840 0.887 0.854 0.773
## 3470 62972 174 0.780 0.776 0.840 0.718
## 3500 207 3132 0.650 0.624 0.788 0.558
## 3624 14492 278 0.770 0.750 0.825 0.726
## 3669 26512 977 0.732 0.673 0.830 0.703
## 3804 9566 530 0.750 0.761 0.836 0.666
## 3828 2695 816 0.740 0.758 0.851 0.625
## 3850 388 1114 0.727 0.679 0.810 0.698
## 3949 203306 32 0.805 0.867 0.857 0.702
## 4002 31876 478 0.754 0.744 0.834 0.692
## 4074 12449 2460 0.680 0.659 0.810 0.589
## 4112 148540 215 0.772 0.798 0.825 0.698
## 4179 11981 101 0.784 0.749 0.847 0.760
## 4209 870112 46 0.799 0.840 0.845 0.719
## 4225 3420 576 0.750 0.684 0.823 0.745
## 4378 770 1755 0.706 0.660 0.839 0.636
## 4505 6225 16 0.810 0.876 0.849 0.725
## 4537 89466 14 0.815 0.819 0.861 0.769
## 4608 28049 1 0.862 0.891 0.887 0.811
## 4707 50954 1341 0.720 0.693 0.831 0.646
## 4807 5895 391 0.759 0.746 0.871 0.673
## 5117 19114 240 0.769 0.742 0.863 0.710
## 5367 29 5561 0.453 0.439 0.766 0.276
## 5431 6244 632 0.746 0.733 0.815 0.695
## 5443 3686 237 0.770 0.755 0.884 0.683
## 5459 8033 383 0.759 0.720 0.863 0.705
## 5486 6489 2242 0.688 0.677 0.811 0.592
## 5537 37226 5 0.845 0.876 0.855 0.805
## LONG LAT ALT AREA RURAL_URBAN GVA_AGROPEC
## 70 -47.88397 -22.597340 515.24 3.61 Intermediário Adjacente 0.00
## 178 -51.07773 -29.997493 13.44 71.60 Urbano 1379.14
## 295 -37.04821 -10.907216 4.29 182.16 Urbano 2.68
## 296 -47.07155 -23.430041 710.68 145.20 Intermediário Adjacente 1213.11
## 385 -41.88775 -22.757764 10.97 70.98 Urbano 8563.51
## 392 -42.02834 -22.967638 8.84 152.11 Urbano 15182.40
## 455 -34.94993 -6.679529 8.14 102.64 Rural Adjacente 14355.43
## 468 -48.63462 -26.991819 9.05 45.21 Urbano 7.56
## 559 -46.87465 -23.508902 741.57 65.70 Urbano 315.72
## 588 -43.39962 -22.764556 17.90 78.99 Urbano 2679.25
## 593 -43.92645 -19.937524 937.53 331.40 Urbano 2300.08
## 707 -48.52135 -27.144255 73.30 35.14 Urbano 17434.02
## 826 -34.83943 -6.966983 4.58 29.76 Urbano 7.13
## 855 -51.09368 -29.950629 14.16 43.90 Urbano 1109.77
## 923 -34.99572 -8.020522 43.12 51.26 Urbano 7987.96
## 978 -46.76382 -23.209396 765.88 79.40 Urbano 7.87
## 1036 -51.18103 -29.918697 19.40 130.79 Urbano 6887.23
## 1098 -46.84145 -23.535249 785.34 34.55 Urbano 225.52
## 1357 -43.99560 -19.629948 767.79 42.36 Urbano 418.24
## 1437 -46.93185 -23.603514 850.25 323.99 Urbano 22.25
## 1490 -46.42003 -23.883839 6.88 142.88 Urbano 902.06
## 1509 -49.27185 -25.432956 910.89 435.04 Urbano 11206.58
## 1552 -46.62338 -23.689295 812.84 30.73 Urbano 0.77
## 1657 -46.85086 -23.647313 791.83 70.40 Urbano 270.62
## 1731 -38.44512 -3.886973 33.06 79.01 Urbano 19350.98
## 1774 -32.43519 -3.852021 0.00 18.61 Rural Remoto 484.83
## 1832 -38.58993 -3.723805 29.91 312.41 Urbano 47368.39
## 2037 -46.53108 -23.468506 776.36 318.68 Urbano 40225.68
## 2068 -47.22110 -22.858395 584.89 62.42 Urbano 1117.40
## 2155 -42.22212 -22.839057 7.03 50.54 Urbano 2872.62
## 2166 -47.55432 -24.739240 7.93 196.57 Intermediário Adjacente 3788.65
## 2181 -48.66928 -28.239951 22.67 182.91 Urbano 27516.95
## 2360 -38.68403 -12.881489 3.67 118.04 Urbano 7265.47
## 2376 -46.93337 -23.546934 743.05 82.66 Urbano 248.07
## 2401 -46.35160 -23.476897 762.25 82.62 Urbano 10994.59
## 2515 -46.90522 -23.529939 755.57 17.45 Urbano 496.13
## 2524 -43.65379 -22.644819 32.94 81.70 Urbano 7.73
## 2587 -46.27342 -22.930678 924.36 374.29 Rural Adjacente 27249.17
## 2752 -38.32346 -12.896718 18.15 57.66 Urbano 2219.82
## 2783 -46.66148 -22.520488 717.27 48.76 Intermediário Adjacente 5.86
## 2937 -35.01392 -6.770054 92.93 123.83 Rural Adjacente 23738.38
## 3028 -46.45826 -23.669335 789.33 61.91 Urbano 609.66
## 3052 -43.42922 -22.768088 20.71 41.49 Urbano 2592.96
## 3241 -35.25225 -5.750899 44.47 167.40 Urbano 17.08
## 3267 -43.41661 -22.807514 19.13 19.39 Urbano 0.00
## 3274 -43.07582 -22.896452 117.78 133.76 Urbano 17659.85
## 3470 -46.78881 -23.533612 742.97 64.95 Urbano 947.16
## 3500 -61.14731 4.475259 912.13 8028.48 Rural Remoto 7151.02
## 3624 -35.25921 -5.910370 55.50 124.01 Urbano 24157.84
## 3669 -34.88479 -7.943188 19.58 96.85 Urbano 11249.65
## 3804 -49.19920 -25.442198 880.64 60.87 Urbano 1323.71
## 3828 -46.35876 -23.050499 793.71 385.57 Intermediário Adjacente 24998.16
## 3850 -47.00097 -23.397523 705.51 108.49 Urbano 0.16
## 3949 -51.22866 -30.030037 42.24 495.39 Urbano 28354.58
## 4002 -46.41205 -24.003021 8.68 149.25 Urbano 3653.34
## 4074 -43.55567 -22.717430 33.15 75.70 Urbano 2996.82
## 4112 -34.88894 -8.062762 10.33 218.84 Urbano 34667.91
## 4179 -46.41534 -23.707423 757.08 99.08 Urbano 1821.37
## 4209 -43.22788 -22.876652 11.80 1200.26 Urbano 81.37
## 4225 -46.39369 -23.744515 762.98 36.34 Urbano 0.68
## 4378 -44.22326 -21.119471 908.71 3.57 Urbano 218.05
## 4505 -46.92209 -23.449453 769.83 179.95 Urbano 94613.78
## 4537 -46.53087 -23.657510 764.10 175.78 Urbano 1012.60
## 4608 -46.57151 -23.614705 754.99 15.33 Urbano 23.52
## 4707 -43.37188 -22.802331 15.56 35.22 Urbano 1034.78
## 4807 -45.05337 -22.117769 888.72 58.02 Urbano 1715.32
## 5117 -46.78578 -23.623328 803.24 20.39 Urbano 180.41
## 5367 -60.19572 4.585440 605.80 8065.56 Rural Remoto 9864.83
## 5431 -47.98411 -16.069575 1105.85 60.95 Urbano 0.43
## 5443 -47.01965 -23.615302 926.93 42.49 Urbano 32189.74
## 5459 -46.82989 -23.214467 729.74 35.12 Urbano 1872.03
## 5486 -43.91992 -19.693030 679.03 71.08 Urbano 791.68
## 5537 -40.32221 -20.320154 0.00 97.12 Urbano 14437.58
## GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## 70 10504.94 94367.21 20211.04 125083.20 7533.36
## 178 450.98 1108.76 749842.52 2310961.10 212178.51
## 295 2624804.03 9244912.10 2652.55 14524947.50 1973534.61
## 296 626413.92 1010191.11 104138.74 1741956.88 379.54
## 385 285.57 819191.05 284130.31 1397458.05 77485.55
## 392 87133.35 276890.43 242622.01 621828.18 33333.00
## 455 2162.40 18541.17 39788.97 74847.96 2.56
## 468 671328.06 3006137.08 757157.55 4442184.84 488228.41
## 559 4212.89 28113200.81 1747221.73 34073626.94 13014674.64
## 588 988889.97 3428106.83 3073707.32 7493383.37 790665.42
## 593 11901585.70 53213121.50 10664796.91 75781.80 12495.66
## 707 77136.54 344876.47 105221.09 544.67 48297.74
## 826 426650.48 1283696.98 373485.12 2090.96 397320.13
## 855 1079.27 2399310.59 578588.41 4058276.69 1206.66
## 923 75798.16 739599.70 509551.68 1332937.50 142773.53
## 978 378706.96 811522.38 287912.58 1486007.07 189987.65
## 1036 8291607.70 7300984.36 1599697.49 17199176.79 2329361.47
## 1098 814649.19 2891957.81 1059656.47 4766488.99 447.62
## 1357 65122.69 819.12 39.40 924053.76 82.96
## 1437 2677627.60 5308453.91 894.96 8903289.39 2088168.40
## 1490 10055568.72 4752510.12 795266.65 15604247.55 2063753.92
## 1509 12802824.51 46249176.62 8813208.36 67876.42 15912.49
## 1552 3547136.52 5907985.66 1541552.72 10997.44 2232.30
## 1657 1560323.83 5720.78 843771.86 8125147.04 1879499.86
## 1731 1038868.71 1110685.38 254682.16 2423587.22 644359.77
## 1774 5392.96 95943.85 7.26 109079.16 15244.51
## 1832 9060367.79 35008332.72 8020931.83 52137000.73 8004144.47
## 2037 11091048.69 29551381.17 4840743.94 45523399.48 8451519.21
## 2068 3945253.89 5001.13 841536.81 9789041.14 1726030.36
## 2155 35.90 169977.75 198551.93 407.30 19.63
## 2166 268191.60 189875.31 84032.76 545888.32 9339.32
## 2181 160695.15 773.40 188791.13 1150399.50 213135.17
## 2360 21341.65 93112.91 74443.99 196164.02 13710.64
## 2376 3286882.77 6090.88 806.50 10184508.99 1963153.38
## 2401 1389143.49 3194865.51 1162896.21 5757899.80 749.79
## 2515 717358.59 1612719.13 407.71 2738284.17 681348.11
## 2524 133350.13 420982.38 636777.17 1198835.42 95810.65
## 2587 16273.71 100406.88 47870.14 191799.90 10375.20
## 2752 1225025.66 3357161.11 665466.27 5249.87 854208.18
## 2783 35331.54 66188.94 32392.80 139770.45 11974.48
## 2937 1724.29 11192.86 37551.82 74207.34 1436.36
## 3028 4373.72 6294256.62 1348676.31 12017265.64 1946580.85
## 3052 139944.30 953082.06 1086298.23 2181917.56 109216.36
## 3241 2932165.56 12122912.17 3849761.91 18921922.95 2923557.74
## 3267 149020.37 1290.53 957704.94 2397252.66 143063.11
## 3274 4391714.38 12624595.99 3677333.63 20711303.86 2292039.48
## 3470 3036.39 53111430.66 2627577.24 58776349.73 15626341.33
## 3500 8161.28 29966.90 114622.75 159901.95 4616.44
## 3624 674.82 2422.88 1108079.13 4229938.39 792553.72
## 3669 583029.25 1786743.49 1045.74 3426760.35 389256.87
## 3804 1082404.71 2837.23 534458.77 4455414.76 931155.45
## 3828 71238.25 216934.61 98015.49 411186.50 32706.56
## 3850 48176.37 131861.11 76436.12 256631.46 11722.54
## 3949 6768083.47 48930408.04 6712383.63 62439229.72 10986034.54
## 4002 735.55 3739905.66 1301224.94 5780331.57 400744.30
## 4074 1077940.68 2045289.03 944494.99 4070721.52 599495.51
## 4112 5929707.87 29628615.15 6143514.39 41736505.32 7807.58
## 4179 700.98 1571975.50 431476.15 2706251.06 315587.79
## 4209 36334430.50 177361095.84 47548.35 261325243.88 68106.12
## 4225 162057.04 223.38 149879.29 536005.86 42080.68
## 4378 5.18 36085.18 29902.03 71382.30 3498.58
## 4505 1743146.63 4593.73 658868.33 7090362.10 1394975.41
## 4537 4327.28 15730746.19 2464771.07 22523809.03 3313237.11
## 4608 2950.35 6765.71 980547.41 10696632.39 2590079.00
## 4707 485315.97 5341929.53 2688502.16 8516782.44 894.03
## 4807 63393.89 576631.22 193424.86 835165.29 81645.86
## 5117 2173940.31 4103684.30 885803.95 7163608.97 1186413.66
## 5367 1189.55 4.75 87.28 103089.25 0.59
## 5431 287258.37 1245114.13 551217.67 2084016.30 215292.65
## 5443 383983.22 876724.73 194315.57 1487213.26 258218.45
## 5459 680125.23 965083.82 391572.39 2038653.47 322027.69
## 5486 758627.29 1142201.22 471306.97 2372.93 446829.89
## 5537 3225072.88 11635463.05 1744085.64 16619059.14 5108035.55
## GDP POP_GDP GDP_CAPITA
## 70 132616.56 3205 41378.02
## 178 2523139.61 207392 12166.04
## 295 16498482.10 641523 25717.68
## 296 2121496.97 20581 103080.36
## 385 1474943.60 31674 46566.38
## 392 655161.18 29077 22531.94
## 455 77405.46 8951 8647.69
## 468 4930413.26 131727 37429.03
## 559 47088301.58 264935 177735.30
## 588 8284048.78 494141 16764.54
## 593 88277462.53 2513451 35122.01
## 707 592.97 18052 32847.65
## 826 2488279.38 66858 37217.38
## 855 5264940.27 126666 41565.54
## 923 1475711.03 155228 9506.73
## 978 1675994.72 81693 20515.77
## 1036 19528538.26 342634 56995.33
## 1098 5214112.51 394465 13218.19
## 1357 1007.01 6545 153860.05
## 1437 10991457.80 233696 47033.14
## 1490 17668.00 127887 138153.22
## 1509 83788.90 1893997 44239.20
## 1552 13229.74 415180 31865.08
## 1657 10004646.90 264448 37832.19
## 1731 3067946.99 51913 59097.86
## 1774 124.32 2974 41803.52
## 1832 60141145.20 2609716 23045.09
## 2037 53974918.69 1337087 40367.54
## 2068 11515071.50 219039 52570.87
## 2155 426.93 26430 16153.08
## 2166 555227.64 10476 52999.97
## 2181 1363534.67 43624 31256.53
## 2360 209874.65 22744 9227.69
## 2376 12147662.36 226488 53634.91
## 2401 6507690.31 356774 18240.37
## 2515 3419632.28 120177 28454.96
## 2524 1294646.07 100562 12874.11
## 2587 202175.10 12837 15749.40
## 2752 6104081.03 194641 31360.72
## 2783 151744.92 7591 19990.11
## 2937 75643.70 8475 8925.51
## 3028 13963846.49 457696 30509.00
## 3052 2291133.91 171020 13396.88
## 3241 21845480.68 877662 24890.54
## 3267 2540315.77 158319 16045.55
## 3274 23003343.34 497883 46202.31
## 3470 74402691.05 696382 106841.78
## 3500 164518.39 12144 13547.30
## 3624 5022492.12 248623 20201.24
## 3669 3816.02 325590 11720.31
## 3804 5386570.20 128256 41998.58
## 3828 443893.06 26841 16537.87
## 3850 268353.99 17913 14980.96
## 3949 73425.26 1481019 49577.53
## 4002 6181.08 304705 20285.44
## 4074 4670217.02 144525 32314.25
## 4112 49544087.54 1625583 30477.73
## 4179 3021838.84 121130 24947.07
## 4209 329431359.90 6498837 50690.82
## 4225 578086.55 48861 11831.25
## 4378 74.88 8489 8820.93
## 4505 8485.34 129261 65644.99
## 4537 25837046.14 712749 36249.85
## 4608 13286.71 158825 83656.30
## 4707 9410.81 460541 20434.26
## 4807 916811.15 45128 20315.79
## 5117 8350022.63 275948 30259.41
## 5367 103680.32 9664 10728.51
## 5431 2299308.95 156419 14699.68
## 5443 1745431.72 49542 35231.35
## 5459 2360681.16 117772 20044.50
## 5486 2819757.05 120510 23398.53
## 5537 21727094.68 359555 60427.74
## GVA_MAIN
## 70 Demais serviços
## 178 Demais serviços
## 295 Demais serviços
## 296 Demais serviços
## 385 Demais serviços
## 392 Administração, defesa, educação e saúde públicas e seguridade social
## 455 Administração, defesa, educação e saúde públicas e seguridade social
## 468 Demais serviços
## 559 Demais serviços
## 588 Administração, defesa, educação e saúde públicas e seguridade social
## 593 Demais serviços
## 707 Demais serviços
## 826 Comércio e reparação de veículos automotores e motocicletas
## 855 Demais serviços
## 923 Demais serviços
## 978 Demais serviços
## 1036 Indústrias de transformação
## 1098 Demais serviços
## 1357 Demais serviços
## 1437 Demais serviços
## 1490 Indústrias de transformação
## 1509 Demais serviços
## 1552 Demais serviços
## 1657 Comércio e reparação de veículos automotores e motocicletas
## 1731 Demais serviços
## 1774 Demais serviços
## 1832 Demais serviços
## 2037 Demais serviços
## 2068 Demais serviços
## 2155 Administração, defesa, educação e saúde públicas e seguridade social
## 2166 Indústrias extrativas
## 2181 Demais serviços
## 2360 Administração, defesa, educação e saúde públicas e seguridade social
## 2376 Demais serviços
## 2401 Demais serviços
## 2515 Demais serviços
## 2524 Administração, defesa, educação e saúde públicas e seguridade social
## 2587 Demais serviços
## 2752 Demais serviços
## 2783 Demais serviços
## 2937 Administração, defesa, educação e saúde públicas e seguridade social
## 3028 Demais serviços
## 3052 Administração, defesa, educação e saúde públicas e seguridade social
## 3241 Demais serviços
## 3267 Demais serviços
## 3274 Demais serviços
## 3470 Demais serviços
## 3500 Administração, defesa, educação e saúde públicas e seguridade social
## 3624 Demais serviços
## 3669 Demais serviços
## 3804 Demais serviços
## 3828 Demais serviços
## 3850 Demais serviços
## 3949 Demais serviços
## 4002 Demais serviços
## 4074 Demais serviços
## 4112 Demais serviços
## 4179 Demais serviços
## 4209 Demais serviços
## 4225 Demais serviços
## 4378 Demais serviços
## 4505 Demais serviços
## 4537 Demais serviços
## 4608 Demais serviços
## 4707 Demais serviços
## 4807 Demais serviços
## 5117 Demais serviços
## 5367 Administração, defesa, educação e saúde públicas e seguridade social
## 5431 Demais serviços
## 5443 Demais serviços
## 5459 Demais serviços
## 5486 Demais serviços
## 5537 Demais serviços
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 70 202 2 1 4 0 1 6 99 3 40
## 178 2994 3 2 382 0 9 309 1324 180 152
## 295 14534 23 14 721 5 21 717 5262 358 1078
## 296 640 2 7 74 0 2 36 241 53 42
## 385 1674 1 1 30 0 5 55 520 27 512
## 392 648 1 1 14 0 8 53 198 27 103
## 455 85 0 0 0 0 1 0 39 1 10
## 468 9507 2 1 359 0 14 821 3064 151 958
## 559 12513 14 7 711 9 34 713 3432 661 816
## 588 3751 1 2 286 0 14 251 1581 165 172
## 593 103867 226 139 5221 96 156 6235 28240 2733 6514
## 707 1447 0 0 64 0 5 150 386 18 349
## 826 1370 1 1 129 0 7 115 482 40 73
## 855 4453 4 0 652 0 10 319 1772 323 218
## 923 1328 2 0 127 0 4 42 659 28 85
## 978 1198 16 0 151 0 2 76 459 52 81
## 1036 11222 3 7 998 2 35 739 4377 922 563
## 1098 5413 3 0 389 0 19 390 2365 351 367
## 1357 201 1 0 29 0 0 12 67 15 19
## 1437 7630 23 2 640 0 21 416 2441 363 387
## 1490 1856 0 0 61 1 7 155 648 218 194
## 1509 101929 227 34 6025 109 163 6373 33566 3873 5855
## 1552 8064 3 0 1550 0 30 352 3080 492 504
## 1657 3708 13 3 245 1 24 256 1597 213 231
## 1731 2387 11 2 288 0 14 203 732 74 55
## 1774 236 2 0 4 0 0 1 39 9 124
## 1832 57476 132 30 5739 39 111 3008 20909 1627 3744
## 2037 28349 35 8 2847 1 111 1304 11186 2381 1879
## 2068 4324 10 0 380 0 13 379 1747 261 264
## 2155 331 1 0 13 0 1 20 137 5 16
## 2166 292 1 0 7 0 1 15 144 3 64
## 2181 1548 4 9 106 0 6 92 620 125 219
## 2360 163 0 0 4 0 1 5 84 3 24
## 2376 2399 3 0 168 0 6 192 973 155 147
## 2401 4336 34 3 647 0 27 275 1902 224 237
## 2515 1967 0 0 204 0 5 162 752 172 104
## 2524 614 1 4 44 0 3 47 241 34 33
## 2587 522 39 0 58 0 0 13 229 21 35
## 2752 7118 14 6 575 0 14 572 2500 207 359
## 2783 262 3 1 38 0 0 23 84 5 32
## 2937 36 2 0 2 0 0 0 15 1 1
## 3028 6448 3 1 875 1 37 406 2515 285 403
## 3052 1468 1 0 126 0 5 86 641 51 86
## 3241 21530 57 16 1156 21 37 1770 7592 382 1682
## 3267 2022 2 0 127 0 2 98 857 62 157
## 3274 17097 13 9 603 5 24 869 4623 293 1268
## 3470 15315 11 2 863 1 29 703 6002 1026 1230
## 3500 108 0 0 1 0 0 3 86 3 5
## 3624 4074 10 2 342 0 14 326 1776 131 259
## 3669 3007 10 0 309 1 11 140 1334 65 176
## 3804 5275 5 1 1018 0 21 445 1943 219 275
## 3828 773 32 3 101 0 3 39 305 8 59
## 3850 429 1 5 20 0 0 12 101 10 18
## 3949 80082 196 31 3482 57 95 4039 21550 2523 4205
## 4002 7418 3 0 191 0 14 538 2334 127 608
## 4074 1177 1 2 78 0 6 93 516 44 73
## 4112 40041 84 17 2059 53 63 1967 13147 1176 2882
## 4179 2751 10 1 271 0 5 138 1010 150 174
## 4209 190038 172 274 6824 235 272 7797 47545 4825 12289
## 4225 424 5 0 25 0 3 37 186 29 28
## 4378 205 0 1 76 0 0 3 77 7 9
## 4505 8909 9 6 463 1 14 351 1749 261 178
## 4537 24972 11 1 1723 1 36 1340 8480 999 1571
## 4608 9735 3 0 714 1 7 336 2921 328 679
## 4707 5010 2 0 496 0 20 169 2213 237 320
## 4807 1684 2 6 95 0 2 53 764 33 175
## 5117 5135 3 1 513 0 18 332 2060 199 306
## 5367 8 0 0 0 0 0 0 7 0 0
## 5431 2218 1 0 149 0 8 193 976 39 139
## 5443 1360 2 0 149 0 5 104 502 54 68
## 5459 1975 8 0 388 0 13 134 782 137 105
## 5486 1625 3 3 162 0 8 138 594 57 99
## 5537 17924 30 16 535 4 17 850 4405 323 1185
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 70 5 2 1 9 15 2 3 3 2 4
## 178 46 14 34 82 172 2 97 37 32 117
## 295 349 182 287 1081 1634 38 565 971 265 963
## 296 9 4 17 24 61 2 22 10 9 25
## 385 17 4 35 35 298 3 26 15 26 64
## 392 8 0 9 17 125 4 22 8 7 43
## 455 0 0 0 1 0 3 9 1 0 20
## 468 140 149 482 466 1976 7 174 250 119 374
## 559 859 820 516 1222 1469 4 322 303 129 472
## 588 35 15 18 76 171 3 239 88 54 580
## 593 3951 3501 2785 11925 17752 85 3325 4217 1436 5327
## 707 14 11 57 27 291 3 11 9 15 37
## 826 6 8 13 20 347 4 32 15 20 57
## 855 99 66 63 201 366 2 86 74 36 162
## 923 24 3 11 27 95 6 78 32 19 86
## 978 38 9 18 56 73 2 84 31 10 40
## 1036 271 159 191 656 1053 4 346 268 150 478
## 1098 164 42 31 177 432 2 240 97 53 291
## 1357 2 1 1 14 21 2 2 3 6 6
## 1437 403 174 197 599 915 4 335 194 116 398
## 1490 22 15 11 87 172 4 78 45 34 104
## 1509 4535 3197 2527 9130 12987 89 3030 4197 1307 4697
## 1552 160 93 91 258 620 4 288 114 76 349
## 1657 92 41 47 142 334 4 184 52 54 175
## 1731 60 86 55 217 218 3 57 226 21 65
## 1774 1 0 0 4 34 1 4 0 8 5
## 1832 1247 902 1113 3321 6545 120 2379 2156 849 3504
## 2037 704 398 464 1155 2413 5 992 696 248 1522
## 2068 145 62 49 210 291 3 190 104 46 170
## 2155 7 1 4 7 53 3 23 7 7 26
## 2166 6 0 2 5 18 2 8 6 0 10
## 2181 22 3 21 55 104 2 36 41 24 59
## 2360 0 0 0 2 15 2 8 4 2 9
## 2376 67 21 24 98 221 2 96 59 23 144
## 2401 50 34 39 88 232 3 158 53 40 290
## 2515 61 15 21 91 159 7 53 26 11 124
## 2524 0 2 0 9 26 4 29 16 9 112
## 2587 10 9 12 20 18 3 17 11 5 22
## 2752 162 100 184 647 946 2 193 221 106 310
## 2783 1 4 2 5 35 2 7 4 5 11
## 2937 0 0 0 1 1 2 5 0 0 6
## 3028 148 71 78 218 492 3 289 159 83 381
## 3052 31 7 7 41 107 3 75 30 24 147
## 3241 430 300 551 1454 2453 44 844 1165 333 1243
## 3267 29 10 18 71 153 4 110 99 29 194
## 3274 526 304 408 1421 3144 17 695 1111 322 1442
## 3470 706 271 238 761 1535 6 527 492 142 770
## 3500 0 0 0 1 3 2 0 1 0 3
## 3624 38 23 97 136 399 3 181 96 63 178
## 3669 64 24 28 132 258 4 193 80 38 140
## 3804 138 73 73 279 347 5 120 91 67 155
## 3828 12 17 11 34 53 3 25 24 9 35
## 3850 179 1 1 21 20 3 19 5 5 8
## 3949 3555 2461 1924 8139 16271 72 2217 3489 1339 4429
## 4002 84 49 149 167 2487 2 198 109 88 270
## 4074 16 2 7 31 44 3 49 39 14 159
## 4112 1123 818 767 3101 6353 91 1572 1698 489 2574
## 4179 188 33 24 137 210 2 111 125 36 126
## 4209 9070 6327 4281 19248 34812 120 6744 9905 5039 14224
## 4225 8 4 4 12 18 2 28 11 3 21
## 4378 3 0 1 7 5 1 2 1 3 9
## 4505 1755 443 240 1610 1143 3 164 125 116 278
## 4537 1239 574 450 1798 3303 27 880 1078 282 1179
## 4608 595 258 197 704 1719 5 270 418 110 470
## 4707 46 21 31 140 235 3 263 159 75 580
## 4807 27 24 20 59 157 4 54 110 27 72
## 5117 165 68 81 180 567 3 218 108 44 269
## 5367 0 0 0 0 0 1 0 0 0 0
## 5431 25 23 56 61 208 3 141 67 31 98
## 5443 61 22 21 76 114 2 62 24 16 78
## 5459 32 9 16 54 128 2 66 27 12 62
## 5486 18 6 21 66 128 4 83 143 19 73
## 5537 609 593 539 2207 3129 55 568 1300 230 1329
## COMP_U
## 70 0
## 178 0
## 295 0
## 296 0
## 385 0
## 392 0
## 455 0
## 468 0
## 559 0
## 588 0
## 593 3
## 707 0
## 826 0
## 855 0
## 923 0
## 978 0
## 1036 0
## 1098 0
## 1357 0
## 1437 2
## 1490 0
## 1509 8
## 1552 0
## 1657 0
## 1731 0
## 1774 0
## 1832 1
## 2037 0
## 2068 0
## 2155 0
## 2166 0
## 2181 0
## 2360 0
## 2376 0
## 2401 0
## 2515 0
## 2524 0
## 2587 0
## 2752 0
## 2783 0
## 2937 0
## 3028 0
## 3052 0
## 3241 0
## 3267 0
## 3274 0
## 3470 0
## 3500 0
## 3624 0
## 3669 0
## 3804 0
## 3828 0
## 3850 0
## 3949 8
## 4002 0
## 4074 0
## 4112 7
## 4179 0
## 4209 35
## 4225 0
## 4378 0
## 4505 0
## 4537 0
## 4608 0
## 4707 0
## 4807 0
## 5117 0
## 5367 0
## 5431 0
## 5443 0
## 5459 0
## 5486 0
## 5537 0
Here we can see that by comparing the IBGE_DU and IBGE_DU_URBAN values that the NA values are due to missing 0s as all the IBGE_DU are classified as urban. We will then do a mass fill for the columns.
Brazil_cities_allpop$IBGE_DU_RURAL[is.na(Brazil_cities_allpop$IBGE_DU_RURAL)] <- 0
summary(Brazil_cities_allpop)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5565 Length:5565
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5559
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median :0.000000 Median : 10934 Median : 10926 Median : 0.0
## Mean :0.004852 Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.:0.000000 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.0
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 0 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 471 1st Qu.: 2801
## Median : 3174 Median : 1846 Median : 918 Median : 6170
## Mean : 10303 Mean : 8859 Mean : 1443 Mean : 27595
## 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1813 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
## NA's :2 NA's :2
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29 Min. : 1 Min. :0.4180
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.:1392 1st Qu.:0.5990
## Median : 3841 Median : 722 Median :2782 Median :0.6650
## Mean : 18212 Mean : 3004 Mean :2783 Mean :0.6592
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012 Max. :5565 Max. :0.8620
## NA's :1 NA's :1
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :1 NA's :1 NA's :1 NA's :2
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5565
## 1st Qu.:-22.838 1st Qu.: 169.7 1st Qu.: 204.53 Class :character
## Median :-18.090 Median : 406.5 Median : 416.59 Mode :character
## Mean :-16.445 Mean : 894.0 Mean : 1515.39
## 3rd Qu.: -8.489 3rd Qu.: 629.0 3rd Qu.: 1025.73
## Max. : 4.585 Max. :874579.0 Max. :159533.33
## NA's :2 NA's :2 NA's :1
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4193 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17260
## Median : 20430 Median : 7425 Median : 31212 Median : 35809
## Mean : 47263 Mean : 176049 Mean : 489855 Mean : 123844
## 3rd Qu.: 51238 3rd Qu.: 41011 3rd Qu.: 115521 3rd Qu.: 89316
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42254 1st Qu.: 1303 1st Qu.: 43706 1st Qu.: 5488
## Median : 119481 Median : 5107 Median : 125111 Median : 11584
## Mean : 833592 Mean : 118962 Mean : 955266 Mean : 37023
## 3rd Qu.: 313988 3rd Qu.: 22209 3rd Qu.: 329717 3rd Qu.: 25102
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5565 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9062 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15866 Mode :character Median : 162.0 Median : 2.00
## Mean : 21119 Mean : 907.5 Mean : 18.27
## 3rd Qu.: 26155 3rd Qu.: 449.0 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.0 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 3.0 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.000 Median : 11.0 Median : 0.0000 Median : 0.00
## Mean : 1.853 Mean : 73.5 Mean : 0.4264 Mean : 2.03
## 3rd Qu.: 2.000 3rd Qu.: 39.0 3rd Qu.: 0.0000 3rd Qu.: 1.00
## Max. :274.000 Max. :31566.0 Max. :332.0000 Max. :657.00
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.00 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.29 Mean : 348.2 Mean : 41.02 Mean : 55.92
## 3rd Qu.: 15.00 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515.00 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.76 Mean : 15.56 Mean : 15.15 Mean : 51.34
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.00 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.00 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.77 Mean : 3.271 Mean : 30.98 Mean : 34.18
## 3rd Qu.: 14.00 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.00 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.19 Mean : 51.65 Mean : 0.05031
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
In this case, IBGE_DU in the reference refers to “Domestic Units”. Upon further investigation, this is reference to Permenant Private Housing Units. We determined this by viewing the source of the data and observing the additional description at the top of the Webpage. Source: https://sidra.ibge.gov.br/tabela/3495
Unfortunately the source data does not provide us with the values we need. However, we can use the alternate data source from the IBGE website report to find a good estimate of these values. Although the values are not exact due to some corrections made further on, after checking with other cities where the IBGE_DU values are known such as Petrolina and Sao Paulo, we can confirm that the data is at least somewhat accurate.
Source: https://cidades.ibge.gov.br/brasil/pb/marcacao/pesquisa/23/25124?tipo=ranking&indicador=29522
From this, we can make a reasonable estimate
Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IBGE_DU))!=0,]
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 2937 Marcação_PB Marcação PB 0 7609 7609
## 5367 Uiramutã_RR Uiramutã RR 0 8375 8375
## IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 2937 0 NA NA 0 2838 45
## 5367 0 NA NA 0 794 19
## IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## 2937 211 277 266 1701 338 5404 0.529
## 5367 83 129 110 424 29 5561 0.453
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT ALT
## 2937 0.525 0.691 0.408 -35.01392 -6.770054 92.93
## 5367 0.439 0.766 0.276 -60.19572 4.585440 605.80
## AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 2937 123.83 Rural Adjacente 23738.38 1724.29 11192.86 37551.82
## 5367 8065.56 Rural Remoto 9864.83 1189.55 4.75 87.28
## GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 2937 74207.34 1436.36 75643.7 8475 8925.51
## 5367 103089.25 0.59 103680.3 9664 10728.51
## GVA_MAIN
## 2937 Administração, defesa, educação e saúde públicas e seguridade social
## 5367 Administração, defesa, educação e saúde públicas e seguridade social
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2937 36 2 0 2 0 0 0 15 1 1
## 5367 8 0 0 0 0 0 0 7 0 0
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2937 0 0 0 1 1 2 5 0 0 6
## 5367 0 0 0 0 0 1 0 0 0 0
## COMP_U
## 2937 0
## 5367 0
Brazil_cities_allpop$IBGE_DU[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 2040
Brazil_cities_allpop$IBGE_DU_URBAN[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 824
Brazil_cities_allpop$IBGE_DU_RURAL[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 1216
Brazil_cities_allpop$IBGE_DU[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 1444
Brazil_cities_allpop$IBGE_DU_URBAN[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 219
Brazil_cities_allpop$IBGE_DU_RURAL[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 1225
summary(Brazil_cities_allpop)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5565 Length:5565
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5559
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median :0.000000 Median : 10934 Median : 10926 Median : 0.0
## Mean :0.004852 Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.:0.000000 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.0
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 0 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 472 1st Qu.: 2801
## Median : 3174 Median : 1844 Median : 919 Median : 6170
## Mean : 10300 Mean : 8856 Mean : 1444 Mean : 27595
## 3rd Qu.: 6725 3rd Qu.: 4621 3rd Qu.: 1813 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
##
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29 Min. : 1 Min. :0.4180
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.:1392 1st Qu.:0.5990
## Median : 3841 Median : 722 Median :2782 Median :0.6650
## Mean : 18212 Mean : 3004 Mean :2783 Mean :0.6592
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012 Max. :5565 Max. :0.8620
## NA's :1 NA's :1
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :1 NA's :1 NA's :1 NA's :2
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5565
## 1st Qu.:-22.838 1st Qu.: 169.7 1st Qu.: 204.53 Class :character
## Median :-18.090 Median : 406.5 Median : 416.59 Mode :character
## Mean :-16.445 Mean : 894.0 Mean : 1515.39
## 3rd Qu.: -8.489 3rd Qu.: 629.0 3rd Qu.: 1025.73
## Max. : 4.585 Max. :874579.0 Max. :159533.33
## NA's :2 NA's :2 NA's :1
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4193 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17260
## Median : 20430 Median : 7425 Median : 31212 Median : 35809
## Mean : 47263 Mean : 176049 Mean : 489855 Mean : 123844
## 3rd Qu.: 51238 3rd Qu.: 41011 3rd Qu.: 115521 3rd Qu.: 89316
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42254 1st Qu.: 1303 1st Qu.: 43706 1st Qu.: 5488
## Median : 119481 Median : 5107 Median : 125111 Median : 11584
## Mean : 833592 Mean : 118962 Mean : 955266 Mean : 37023
## 3rd Qu.: 313988 3rd Qu.: 22209 3rd Qu.: 329717 3rd Qu.: 25102
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5565 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9062 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15866 Mode :character Median : 162.0 Median : 2.00
## Mean : 21119 Mean : 907.5 Mean : 18.27
## 3rd Qu.: 26155 3rd Qu.: 449.0 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.0 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 3.0 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.000 Median : 11.0 Median : 0.0000 Median : 0.00
## Mean : 1.853 Mean : 73.5 Mean : 0.4264 Mean : 2.03
## 3rd Qu.: 2.000 3rd Qu.: 39.0 3rd Qu.: 0.0000 3rd Qu.: 1.00
## Max. :274.000 Max. :31566.0 Max. :332.0000 Max. :657.00
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.00 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.29 Mean : 348.2 Mean : 41.02 Mean : 55.92
## 3rd Qu.: 15.00 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515.00 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.76 Mean : 15.56 Mean : 15.15 Mean : 51.34
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.00 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.00 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.77 Mean : 3.271 Mean : 30.98 Mean : 34.18
## 3rd Qu.: 14.00 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.00 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.19 Mean : 51.65 Mean : 0.05031
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
Brazil_cities_allpop[(is.na(Brazil_cities_allpop$LONG))!=0,]
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP
## 3806 Pinhal Da Serra_RS Pinhal Da Serra RS 0 2130
## 4490 Santa Terezinha_BA Santa Terezinha BA 0 9648
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 3806 2130 0 745 180 565
## 4490 9648 0 2891 734 2157
## IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 3806 478 11 22 34 32 312 67
## 4490 2332 40 126 191 217 1419 339
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT
## 3806 3121 0.65 0.641 0.835 0.513 NA NA
## 4490 NA NA NA NA NA NA NA
## ALT AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 3806 NA 438.11 Rural Adjacente 56030.9 267670.32 15.85
## 4490 NA 719.26 Rural Adjacente 13235.2 5398.61 17754.37
## GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 3806 19831.52 359.38 25222.60 384602.56 2115 181845.18
## 4490 32630.97 69019.14 3149.33 72168.48 10619 6796.16
## GVA_MAIN
## 3806 Eletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## 4490 Administração, defesa, educação e saúde públicas e seguridade social
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 3806 45 1 0 2 1 1 3 23 2 4
## 4490 74 2 1 4 0 0 3 37 0 3
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 3806 0 0 0 0 1 2 1 1 0 3
## 4490 1 0 0 1 2 2 12 2 0 4
## COMP_U
## 3806 0
## 4490 0
Source: https://www.latlong.net/ Source: https://www.freemaptools.com/elevation-finder.htm
Brazil_cities_allpop$LONG[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- -51.171909
Brazil_cities_allpop$LAT[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- -27.874420
Brazil_cities_allpop$ALT[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- 918
Brazil_cities_allpop$LONG[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- -39.5184
Brazil_cities_allpop$LAT[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- -12.7498
Brazil_cities_allpop$ALT[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- 210
Source: https://en.wikipedia.org/wiki/Japur%C3%A1
Brazil_cities_allpop[(is.na(Brazil_cities_allpop$AREA))!=0,]
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 2531 Japurá_AM Japurá AM 0 7326 7318
## IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 2531 8 1043 583 460 3235 92
## IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## 2531 369 435 478 1764 97 5451 0.522
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT ALT AREA
## 2531 0.552 0.748 0.345 -66.9969 -1.880845 69.84 NA
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL
## 2531 Rural Remoto 16398.64 2146.9 9908.92 29244.3 57.7
## TAXES GDP POP_GDP GDP_CAPITA
## 2531 1489.89 59.19 4660 12701.43
## GVA_MAIN
## 2531 Administração, defesa, educação e saúde públicas e seguridade social
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2531 16 0 0 0 0 0 0 13 0 0
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2531 0 0 0 0 1 2 0 0 0 0
## COMP_U
## 2531 0
Brazil_cities_allpop$AREA[which(Brazil_cities_allpop$CITY_STATE == "Japurá_AM")] <- 55791
summary(Brazil_cities_allpop)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5565 Length:5565
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5559
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median :0.000000 Median : 10934 Median : 10926 Median : 0.0
## Mean :0.004852 Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.:0.000000 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.0
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 0 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 472 1st Qu.: 2801
## Median : 3174 Median : 1844 Median : 919 Median : 6170
## Mean : 10300 Mean : 8856 Mean : 1444 Mean : 27595
## 3rd Qu.: 6725 3rd Qu.: 4621 3rd Qu.: 1813 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
##
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29 Min. : 1 Min. :0.4180
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.:1392 1st Qu.:0.5990
## Median : 3841 Median : 722 Median :2782 Median :0.6650
## Mean : 18212 Mean : 3004 Mean :2783 Mean :0.6592
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012 Max. :5565 Max. :0.8620
## NA's :1 NA's :1
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :1 NA's :1 NA's :1
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5565
## 1st Qu.:-22.839 1st Qu.: 169.9 1st Qu.: 204.56 Class :character
## Median :-18.090 Median : 406.5 Median : 417.26 Mode :character
## Mean :-16.446 Mean : 893.9 Mean : 1525.15
## 3rd Qu.: -8.490 3rd Qu.: 629.1 3rd Qu.: 1026.38
## Max. : 4.585 Max. :874579.0 Max. :159533.33
##
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4193 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17260
## Median : 20430 Median : 7425 Median : 31212 Median : 35809
## Mean : 47263 Mean : 176049 Mean : 489855 Mean : 123844
## 3rd Qu.: 51238 3rd Qu.: 41011 3rd Qu.: 115521 3rd Qu.: 89316
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42254 1st Qu.: 1303 1st Qu.: 43706 1st Qu.: 5488
## Median : 119481 Median : 5107 Median : 125111 Median : 11584
## Mean : 833592 Mean : 118962 Mean : 955266 Mean : 37023
## 3rd Qu.: 313988 3rd Qu.: 22209 3rd Qu.: 329717 3rd Qu.: 25102
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5565 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9062 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15866 Mode :character Median : 162.0 Median : 2.00
## Mean : 21119 Mean : 907.5 Mean : 18.27
## 3rd Qu.: 26155 3rd Qu.: 449.0 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.0 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 3.0 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.000 Median : 11.0 Median : 0.0000 Median : 0.00
## Mean : 1.853 Mean : 73.5 Mean : 0.4264 Mean : 2.03
## 3rd Qu.: 2.000 3rd Qu.: 39.0 3rd Qu.: 0.0000 3rd Qu.: 1.00
## Max. :274.000 Max. :31566.0 Max. :332.0000 Max. :657.00
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.00 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.29 Mean : 348.2 Mean : 41.02 Mean : 55.92
## 3rd Qu.: 15.00 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515.00 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.76 Mean : 15.56 Mean : 15.15 Mean : 51.34
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.00 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.00 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.77 Mean : 3.271 Mean : 30.98 Mean : 34.18
## 3rd Qu.: 14.00 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.00 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.19 Mean : 51.65 Mean : 0.05031
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IDHM))!=0,]
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP
## 4490 Santa Terezinha_BA Santa Terezinha BA 0 9648
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 4490 9648 0 2891 734 2157
## IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 4490 2332 40 126 191 217 1419 339
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## 4490 NA NA NA NA NA -39.5184
## LAT ALT AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 4490 -12.7498 210 719.26 Rural Adjacente 13235.2 5398.61 17754.37
## GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 4490 32630.97 69019.14 3149.33 72168.48 10619 6796.16
## GVA_MAIN
## 4490 Administração, defesa, educação e saúde públicas e seguridade social
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 4490 74 2 1 4 0 0 3 37 0 3
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 4490 1 0 0 1 2 2 12 2 0 4
## COMP_U
## 4490 0
Unfortunately, we will not be able to use this datapoint as we are unable to replace the remaining missing data values for the Human Development Indexes. For the purpose of this study, this datavalue will also be excluded
Brazil_cities_cleaned<- Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IDHM))==0,]
summary(Brazil_cities_cleaned)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5564 Length:5564
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5558
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.00
## 1st Qu.:0.000000 1st Qu.: 5234 1st Qu.: 5228 1st Qu.: 0.00
## Median :0.000000 Median : 10935 Median : 10930 Median : 0.00
## Mean :0.004853 Mean : 34282 Mean : 34205 Mean : 77.52
## 3rd Qu.:0.000000 3rd Qu.: 23446 3rd Qu.: 23392 3rd Qu.: 10.00
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.00
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 0.0 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 471.8 1st Qu.: 2802
## Median : 3174 Median : 1845 Median : 918.5 Median : 6174
## Mean : 10301 Mean : 8857 Mean : 1443.8 Mean : 27599
## 3rd Qu.: 6726 3rd Qu.: 4622 3rd Qu.: 1813.0 3rd Qu.: 15303
## Max. :3576148 Max. :3548433 Max. :33809.0 Max. :10463636
##
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5.0 Min. : 7 Min. : 12.0
## 1st Qu.: 38.0 1st Qu.: 158.0 1st Qu.: 220 1st Qu.: 259.8
## Median : 92.0 Median : 376.5 Median : 516 Median : 588.5
## Mean : 383.3 Mean : 1544.8 Mean : 2070 Mean : 2381.8
## 3rd Qu.: 232.0 3rd Qu.: 951.2 3rd Qu.: 1300 3rd Qu.: 1478.2
## Max. :129464.0 Max. :514794.0 Max. :684443 Max. :783702.0
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29.0 Min. : 1 Min. :0.4180
## 1st Qu.: 1735 1st Qu.: 341.0 1st Qu.:1392 1st Qu.:0.5990
## Median : 3842 Median : 722.5 Median :2782 Median :0.6650
## Mean : 18215 Mean : 3004.7 Mean :2783 Mean :0.6592
## 3rd Qu.: 9629 3rd Qu.: 1724.2 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012.0 Max. :5565 Max. :0.8620
##
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
##
## LAT ALT AREA RURAL_URBAN
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Length:5564
## 1st Qu.:-22.839 1st Qu.: 169.8 1st Qu.: 204.53 Class :character
## Median :-18.091 Median : 406.5 Median : 416.59 Mode :character
## Mean :-16.447 Mean : 894.0 Mean : 1525.29
## 3rd Qu.: -8.489 3rd Qu.: 629.1 3rd Qu.: 1026.44
## Max. : 4.585 Max. :874579.0 Max. :159533.33
##
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 1 Min. : 2 Min. : 7
## 1st Qu.: 4192 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17258
## Median : 20432 Median : 7428 Median : 31214 Median : 35837
## Mean : 47270 Mean : 176080 Mean : 489940 Mean : 123860
## 3rd Qu.: 51239 3rd Qu.: 41015 3rd Qu.: 115552 3rd Qu.: 89328
## Max. :1402282 Max. :63306755 Max. :464656988 Max. :41902893
##
## GVA_TOTAL TAXES GDP POP_GDP
## Min. : 17 Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 42254 1st Qu.: 1302 1st Qu.: 43691 1st Qu.: 5486
## Median : 119492 Median : 5108 Median : 125153 Median : 11584
## Mean : 833729 Mean : 118983 Mean : 955425 Mean : 37028
## 3rd Qu.: 314039 3rd Qu.: 22219 3rd Qu.: 329733 3rd Qu.: 25105
## Max. :569910503 Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN COMP_TOT COMP_A
## Min. : 3191 Length:5564 Min. : 6.0 Min. : 0.00
## 1st Qu.: 9062 Class :character 1st Qu.: 68.0 1st Qu.: 1.00
## Median : 15870 Mode :character Median : 162.0 Median : 2.00
## Mean : 21122 Mean : 907.6 Mean : 18.27
## 3rd Qu.: 26155 3rd Qu.: 449.2 3rd Qu.: 8.00
## Max. :314638 Max. :530446.0 Max. :1948.00
##
## COMP_B COMP_C COMP_D COMP_E
## Min. : 0.000 Min. : 0.00 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.000 Median : 11.00 Median : 0.0000 Median : 0.000
## Mean : 1.853 Mean : 73.51 Mean : 0.4265 Mean : 2.031
## 3rd Qu.: 2.000 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000
## Max. :274.000 Max. :31566.00 Max. :332.0000 Max. :657.000
##
## COMP_F COMP_G COMP_H COMP_I
## Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 4.00 Median : 75.0 Median : 7.00 Median : 7.00
## Mean : 43.29 Mean : 348.3 Mean : 41.03 Mean : 55.93
## 3rd Qu.: 15.00 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00
## Max. :25222.00 Max. :150633.0 Max. :19515.00 Max. :29290.00
##
## COMP_J COMP_K COMP_L COMP_M
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 1.00 Median : 0.00 Median : 0.00 Median : 4.00
## Mean : 24.77 Mean : 15.57 Mean : 15.15 Mean : 51.34
## 3rd Qu.: 5.00 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00
## Max. :38720.00 Max. :23738.00 Max. :14003.00 Max. :49181.00
##
## COMP_N COMP_O COMP_P COMP_Q
## Min. : 0.00 Min. : 1.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00
## Median : 4.00 Median : 2.000 Median : 6.00 Median : 3.00
## Mean : 83.78 Mean : 3.271 Mean : 30.98 Mean : 34.18
## 3rd Qu.: 14.00 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00
## Max. :76757.00 Max. :204.000 Max. :16030.00 Max. :22248.00
##
## COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.00 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.: 5.00 1st Qu.: 0.00000
## Median : 2.00 Median : 12.00 Median : 0.00000
## Mean : 12.19 Mean : 51.66 Mean : 0.05032
## 3rd Qu.: 6.00 3rd Qu.: 31.00 3rd Qu.: 0.00000
## Max. :6687.00 Max. :24832.00 Max. :123.00000
##
Overall we had lost a total of 9 rows of data during the data cleaning. 3 of which were missing depedent variable of GDP per Capita, 5 of which were missing a large number of variables and lastly 1 due to missing IDHM values.
Overall we reduced our number of variables from 81 to 59. We added 1 variable as a unique identifer for each state, removed 22 variables due to the collection of data recorded after our dependent variable (2016) and removed 1 variable due to a large portion of missing values for each row.
In order to formulate our indicators, we will need to create some derived variables to ensure that our indicators for our explainatory model are not correlated with one another or the dependent variable by some underlying issue. Since our dependent variable is a metric which is divided by population, we would need to process values which are dependant on population in some ways.
We will be taking 3 different approaches in this case.
Using Ratios rather than counts for metrics where we have totals. E.g. (foreign resident population / total resident population)
Using the values divided by POP_GDP which is the population scale used to formulate GDP Per capita.
We can derive more variables for our analysis by converting categorical variables into binary arrays. This will allow us to retain our categorical variables during our regression by making them into dummy variables.
Examining GVA_MAIN
unique(Brazil_cities_cleaned[,37])
## [1] "Demais serviços"
## [2] "Administração, defesa, educação e saúde públicas e seguridade social"
## [3] "Agricultura, inclusive apoio à agricultura e a pós colheita"
## [4] "Indústrias de transformação"
## [5] "Pecuária, inclusive apoio à pecuária"
## [6] "Eletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação"
## [7] "Comércio e reparação de veículos automotores e motocicletas"
## [8] "Indústrias extrativas"
## [9] "Construção"
## [10] "Produção florestal, pesca e aquicultura"
Examining RURAL_URBAN
unique(Brazil_cities_cleaned[,27])
## [1] "Urbano" "Rural Adjacente"
## [3] "Rural Remoto" "Intermediário Adjacente"
## [5] "Intermediário Remoto"
Creating Dummy Variable Arrays
Brazil_cities_CAT <- cbind(Brazil_cities_cleaned, as.data.frame(with(Brazil_cities_cleaned, model.matrix(~ RURAL_URBAN + 0))))
Brazil_cities_CAT <- cbind(Brazil_cities_CAT, as.data.frame(with(Brazil_cities_cleaned, model.matrix(~ GVA_MAIN + 0))))
Dropping Categorical Columns
dropCategorical <- c("GVA_MAIN", "RURAL_URBAN")
Brazil_cities_withDummy <- Brazil_cities_CAT[ , !(names(Brazil_cities_CAT) %in% dropCategorical)]
In order to control for populational differences, we can take ratios instead of pure counts to get a better understanding of the makeup of each town
After examining the data and source of the data. There appears to be an error in the GVA totals. This would greatly affect our ratios for GVA and upon inspection of the source data, all other GVA values are correct except the totals. It is not clear where the values in the totals are coming from, as such we will replace them by summing up all the values for each category of GVA to formulate new GVA totals.
Brazil_cities_withDummy <- Brazil_cities_withDummy %>%
mutate(` GVA_TOTAL ` = as.numeric(rowSums(.[27:30])))
Brazil_cities_Derived <- Brazil_cities_withDummy %>%
# Foregin vs Local population
mutate(RES_BRAZ_POP_RATIO = ifelse((IBGE_RES_POP_BRAS == 0), 0, (IBGE_RES_POP_BRAS/IBGE_RES_POP))) %>%
mutate(RES_FOREIGN_POP_RATIO = ifelse((IBGE_RES_POP_ESTR == 0), 0, (IBGE_RES_POP_ESTR/IBGE_RES_POP))) %>%
# Rural vs Urban Domestic Units
mutate(DOM_URBAN_RATIO = ifelse((IBGE_DU_URBAN == 0), 0, (IBGE_DU_URBAN/IBGE_DU)))%>%
mutate(DOM_RURAL_RATIO = ifelse((IBGE_DU_RURAL == 0), 0, (IBGE_DU_RURAL/IBGE_DU)))%>%
# Residential Population Age Ratios
mutate(POP_BEL_ONE_RATIO = ifelse((IBGE_1 == 0), 0, (IBGE_1/IBGE_POP)))%>%
mutate(POP_ONE_to_FOUR_RATIO = ifelse((`IBGE_1-4` == 0), 0, (`IBGE_1-4`/IBGE_POP)))%>%
mutate(POP_FIVE_to_NINE_RATIO = ifelse((`IBGE_5-9` == 0), 0, (`IBGE_5-9`/IBGE_POP)))%>%
mutate(POP_TEN_to_FOURTEEN_RATIO = ifelse((`IBGE_10-14` == 0), 0, (`IBGE_10-14`/IBGE_POP)))%>%
mutate(POP_WORKING_RATIO = ifelse((`IBGE_15-59` == 0), 0, (`IBGE_15-59`/IBGE_POP))) %>%
mutate(POP_ELDERLY_RATIO = ifelse((`IBGE_60+` == 0), 0, (`IBGE_60+`/IBGE_POP)))%>%
# Gross Added Value Ratios
mutate(GVA_AGROPEC_RATIO = ifelse((GVA_AGROPEC == 0), 0, (GVA_AGROPEC/as.numeric(` GVA_TOTAL `))))%>%
mutate(GVA_INDUSTRY_RATIO = ifelse((GVA_INDUSTRY == 0), 0, (GVA_INDUSTRY/as.numeric(` GVA_TOTAL `))))%>%
mutate(GVA_SERVICES_RATIO = ifelse((GVA_SERVICES == 0), 0, (GVA_SERVICES/as.numeric(` GVA_TOTAL `))))%>%
mutate(GVA_PUBLIC_RATIO = ifelse((GVA_PUBLIC == 0), 0, (GVA_PUBLIC/as.numeric(` GVA_TOTAL `))))%>%
# Company Ratios
mutate(COM_A_RATIO = ifelse((COMP_A == 0), 0, (COMP_A/COMP_TOT)))%>%
mutate(COM_B_RATIO = ifelse((COMP_B == 0), 0, (COMP_B/COMP_TOT)))%>%
mutate(COM_C_RATIO = ifelse((COMP_C == 0), 0, (COMP_C/COMP_TOT)))%>%
mutate(COM_D_RATIO = ifelse((COMP_D == 0), 0, (COMP_D/COMP_TOT)))%>%
mutate(COM_E_RATIO = ifelse((COMP_E == 0), 0, (COMP_E/COMP_TOT)))%>%
mutate(COM_F_RATIO = ifelse((COMP_F == 0), 0, (COMP_F/COMP_TOT)))%>%
mutate(COM_G_RATIO = ifelse((COMP_G == 0), 0, (COMP_G/COMP_TOT)))%>%
mutate(COM_H_RATIO = ifelse((COMP_H == 0), 0, (COMP_H/COMP_TOT)))%>%
mutate(COM_I_RATIO = ifelse((COMP_I == 0), 0, (COMP_I/COMP_TOT)))%>%
mutate(COM_J_RATIO = ifelse((COMP_J == 0), 0, (COMP_J/COMP_TOT)))%>%
mutate(COM_K_RATIO = ifelse((COMP_K == 0), 0, (COMP_K/COMP_TOT)))%>%
mutate(COM_L_RATIO = ifelse((COMP_L == 0), 0, (COMP_L/COMP_TOT)))%>%
mutate(COM_M_RATIO = ifelse((COMP_M == 0), 0, (COMP_M/COMP_TOT)))%>%
mutate(COM_N_RATIO = ifelse((COMP_N == 0), 0, (COMP_N/COMP_TOT)))%>%
mutate(COM_O_RATIO = ifelse((COMP_O == 0), 0, (COMP_O/COMP_TOT)))%>%
mutate(COM_P_RATIO = ifelse((COMP_P == 0), 0, (COMP_P/COMP_TOT)))%>%
mutate(COM_Q_RATIO = ifelse((COMP_Q == 0), 0, (COMP_Q/COMP_TOT)))%>%
mutate(COM_R_RATIO = ifelse((COMP_R == 0), 0, (COMP_R/COMP_TOT)))%>%
mutate(COM_S_RATIO = ifelse((COMP_S == 0), 0, (COMP_S/COMP_TOT)))%>%
mutate(COM_U_RATIO = ifelse((COMP_U == 0), 0, (COMP_U/COMP_TOT)))
Brazil_cities_Derived <- Brazil_cities_Derived %>%
mutate(POP_DENSITY = POP_GDP/AREA)
summary(Brazil_cities_Derived)
## CITY_STATE CITY STATE
## Abadia De Goiás_GO : 1 Length:5564 Length:5564
## Abadia Dos Dourados_MG: 1 Class :character Class :character
## Abadiânia_GO : 1 Mode :character Mode :character
## Abaeté_MG : 1
## Abaetetuba_PA : 1
## Abaiara_CE : 1
## (Other) :5558
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. :0.000000 Min. : 805 Min. : 805 Min. : 0.00
## 1st Qu.:0.000000 1st Qu.: 5234 1st Qu.: 5228 1st Qu.: 0.00
## Median :0.000000 Median : 10935 Median : 10930 Median : 0.00
## Mean :0.004853 Mean : 34282 Mean : 34205 Mean : 77.52
## 3rd Qu.:0.000000 3rd Qu.: 23446 3rd Qu.: 23392 3rd Qu.: 10.00
## Max. :1.000000 Max. :11253503 Max. :11133776 Max. :119727.00
##
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 0.0 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 471.8 1st Qu.: 2802
## Median : 3174 Median : 1845 Median : 918.5 Median : 6174
## Mean : 10301 Mean : 8857 Mean : 1443.8 Mean : 27599
## 3rd Qu.: 6726 3rd Qu.: 4622 3rd Qu.: 1813.0 3rd Qu.: 15303
## Max. :3576148 Max. :3548433 Max. :33809.0 Max. :10463636
##
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5.0 Min. : 7 Min. : 12.0
## 1st Qu.: 38.0 1st Qu.: 158.0 1st Qu.: 220 1st Qu.: 259.8
## Median : 92.0 Median : 376.5 Median : 516 Median : 588.5
## Mean : 383.3 Mean : 1544.8 Mean : 2070 Mean : 2381.8
## 3rd Qu.: 232.0 3rd Qu.: 951.2 3rd Qu.: 1300 3rd Qu.: 1478.2
## Max. :129464.0 Max. :514794.0 Max. :684443 Max. :783702.0
##
## IBGE_15-59 IBGE_60+ IDHM Ranking 2010 IDHM
## Min. : 94 Min. : 29.0 Min. : 1 Min. :0.4180
## 1st Qu.: 1735 1st Qu.: 341.0 1st Qu.:1392 1st Qu.:0.5990
## Median : 3842 Median : 722.5 Median :2782 Median :0.6650
## Mean : 18215 Mean : 3004.7 Mean :2783 Mean :0.6592
## 3rd Qu.: 9629 3rd Qu.: 1724.2 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :7058221 Max. :1293012.0 Max. :5565 Max. :0.8620
##
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
##
## LAT ALT AREA GVA_AGROPEC
## Min. :-33.688 Min. : 0.0 Min. : 3.57 Min. : 0
## 1st Qu.:-22.839 1st Qu.: 169.8 1st Qu.: 204.53 1st Qu.: 4192
## Median :-18.091 Median : 406.5 Median : 416.59 Median : 20432
## Mean :-16.447 Mean : 894.0 Mean : 1525.29 Mean : 47270
## 3rd Qu.: -8.489 3rd Qu.: 629.1 3rd Qu.: 1026.44 3rd Qu.: 51239
## Max. : 4.585 Max. :874579.0 Max. :159533.33 Max. :1402282
##
## GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL
## Min. : 1 Min. : 2 Min. : 7 Min. : 128
## 1st Qu.: 1725 1st Qu.: 10113 1st Qu.: 17258 1st Qu.: 57490
## Median : 7428 Median : 31214 Median : 35837 Median : 124586
## Mean : 176080 Mean : 489940 Mean : 123860 Mean : 837149
## 3rd Qu.: 41015 3rd Qu.: 115552 3rd Qu.: 89328 3rd Qu.: 352878
## Max. :63306755 Max. :464656988 Max. :41902893 Max. :569910503
##
## TAXES GDP POP_GDP GDP_CAPITA
## Min. : -14159 Min. : 15 Min. : 815 Min. : 3191
## 1st Qu.: 1302 1st Qu.: 43691 1st Qu.: 5486 1st Qu.: 9062
## Median : 5108 Median : 125153 Median : 11584 Median : 15870
## Mean : 118983 Mean : 955425 Mean : 37028 Mean : 21122
## 3rd Qu.: 22219 3rd Qu.: 329733 3rd Qu.: 25105 3rd Qu.: 26155
## Max. :117125387 Max. :687035890 Max. :12038175 Max. :314638
##
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 162.0 Median : 2.00 Median : 0.000 Median : 11.00
## Mean : 907.6 Mean : 18.27 Mean : 1.853 Mean : 73.51
## 3rd Qu.: 449.2 3rd Qu.: 8.00 3rd Qu.: 2.000 3rd Qu.: 39.00
## Max. :530446.0 Max. :1948.00 Max. :274.000 Max. :31566.00
##
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 1.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 32.0
## Median : 0.0000 Median : 0.000 Median : 4.00 Median : 75.0
## Mean : 0.4265 Mean : 2.031 Mean : 43.29 Mean : 348.3
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00 3rd Qu.: 200.0
## Max. :332.0000 Max. :657.000 Max. :25222.00 Max. :150633.0
##
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7.00 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41.03 Mean : 55.93 Mean : 24.77 Mean : 15.57
## 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515.00 Max. :29290.00 Max. :38720.00 Max. :23738.00
##
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 1.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 2.000
## Median : 0.00 Median : 4.00 Median : 4.00 Median : 2.000
## Mean : 15.15 Mean : 51.34 Mean : 83.78 Mean : 3.271
## 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00 3rd Qu.: 3.000
## Max. :14003.00 Max. :49181.00 Max. :76757.00 Max. :204.000
##
## COMP_P COMP_Q COMP_R COMP_S
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.: 5.00
## Median : 6.00 Median : 3.00 Median : 2.00 Median : 12.00
## Mean : 30.98 Mean : 34.18 Mean : 12.19 Mean : 51.66
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00 3rd Qu.: 31.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00 Max. :24832.00
##
## COMP_U RURAL_URBANIntermediário Adjacente
## Min. : 0.00000 Min. :0.0000
## 1st Qu.: 0.00000 1st Qu.:0.0000
## Median : 0.00000 Median :0.0000
## Mean : 0.05032 Mean :0.1233
## 3rd Qu.: 0.00000 3rd Qu.:0.0000
## Max. :123.00000 Max. :1.0000
##
## RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :1.0000
## Mean :0.01078 Mean :0.5462
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000
##
## RURAL_URBANRural Remoto RURAL_URBANUrbano
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000
## Mean :0.05805 Mean :0.2617
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000
##
## GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4892
## 3rd Qu.:1.0000
## Max. :1.0000
##
## GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1317
## 3rd Qu.:0.0000
## Max. :1.0000
##
## GVA_MAINComércio e reparação de veículos automotores e motocicletas
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.000000
## Mean :0.008267
## 3rd Qu.:0.000000
## Max. :1.000000
##
## GVA_MAINConstrução GVA_MAINDemais serviços
## Min. :0.000000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.0000
## Median :0.000000 Median :0.0000
## Mean :0.001258 Mean :0.2653
## 3rd Qu.:0.000000 3rd Qu.:1.0000
## Max. :1.000000 Max. :1.0000
##
## GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.01761
## 3rd Qu.:0.00000
## Max. :1.00000
##
## GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000
## Mean :0.04691 Mean :0.00629
## 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000
##
## GVA_MAINPecuária, inclusive apoio à pecuária
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.02894
## 3rd Qu.:0.00000
## Max. :1.00000
##
## GVA_MAINProdução florestal, pesca e aquicultura RES_BRAZ_POP_RATIO
## Min. :0.000000 Min. :0.6228
## 1st Qu.:0.000000 1st Qu.:0.9993
## Median :0.000000 Median :1.0000
## Mean :0.004493 Mean :0.9992
## 3rd Qu.:0.000000 3rd Qu.:1.0000
## Max. :1.000000 Max. :1.0000
##
## RES_FOREIGN_POP_RATIO DOM_URBAN_RATIO DOM_RURAL_RATIO POP_BEL_ONE_RATIO
## Min. :0.0000000 Min. :0.04553 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000000 1st Qu.:0.49148 1st Qu.:0.1696 1st Qu.:0.01209
## Median :0.0000000 Median :0.66263 Median :0.3374 Median :0.01418
## Mean :0.0007593 Mean :0.65205 Mean :0.3479 Mean :0.01445
## 3rd Qu.:0.0006992 3rd Qu.:0.83040 3rd Qu.:0.5085 3rd Qu.:0.01651
## Max. :0.3772182 Max. :1.00000 Max. :0.9545 Max. :0.03314
##
## POP_ONE_to_FOUR_RATIO POP_FIVE_to_NINE_RATIO POP_TEN_to_FOURTEEN_RATIO
## Min. :0.01008 Min. :0.02482 Min. :0.03491
## 1st Qu.:0.05018 1st Qu.:0.06942 1st Qu.:0.08227
## Median :0.05826 Median :0.08012 Median :0.09207
## Mean :0.05951 Mean :0.08169 Mean :0.09344
## 3rd Qu.:0.06717 3rd Qu.:0.09180 3rd Qu.:0.10346
## Max. :0.11881 Max. :0.16247 Max. :0.16649
##
## POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO
## Min. :0.4716 Min. :0.02255 Min. :0.00000 Min. :0.0000157
## 1st Qu.:0.6087 1st Qu.:0.09799 1st Qu.:0.03364 1st Qu.:0.0368730
## Median :0.6325 Median :0.11921 Median :0.15062 Median :0.0714602
## Mean :0.6308 Mean :0.12009 Mean :0.21034 Mean :0.1377745
## 3rd Qu.:0.6543 3rd Qu.:0.14103 3rd Qu.:0.34094 3rd Qu.:0.1795132
## Max. :0.7448 Max. :0.42199 Max. :0.99877 Max. :0.9991868
##
## GVA_SERVICES_RATIO GVA_PUBLIC_RATIO COM_A_RATIO COM_B_RATIO
## Min. :0.0000461 Min. :0.0000433 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.1985910 1st Qu.:0.1448472 1st Qu.:0.001569 1st Qu.:0.000000
## Median :0.3117002 Median :0.2948082 Median :0.011803 Median :0.000000
## Mean :0.3260963 Mean :0.3257928 Mean :0.039408 Mean :0.006019
## 3rd Qu.:0.4600063 3rd Qu.:0.4966551 3rd Qu.:0.031915 3rd Qu.:0.005188
## Max. :0.9995977 Max. :0.9996029 Max. :0.917085 Max. :0.333333
##
## COM_C_RATIO COM_D_RATIO COM_E_RATIO COM_F_RATIO
## Min. :0.00000 Min. :0.0000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.03636 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.01389
## Median :0.06590 Median :0.0000000 Median :0.000000 Median :0.02778
## Mean :0.07967 Mean :0.0007847 Mean :0.002508 Mean :0.03130
## 3rd Qu.:0.10593 3rd Qu.:0.0000000 3rd Qu.:0.003226 3rd Qu.:0.04348
## Max. :0.54518 Max. :0.4444444 Max. :0.083333 Max. :0.29213
##
## COM_G_RATIO COM_H_RATIO COM_I_RATIO COM_J_RATIO
## Min. :0.01789 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.38980 1st Qu.:0.01562 1st Qu.:0.02128 1st Qu.:0.000000
## Median :0.46396 Median :0.03757 Median :0.04167 Median :0.007299
## Mean :0.47234 Mean :0.04955 Mean :0.04567 Mean :0.009054
## 3rd Qu.:0.55263 3rd Qu.:0.07052 3rd Qu.:0.06202 3rd Qu.:0.013982
## Max. :0.89091 Max. :0.43689 Max. :0.52542 Max. :0.417249
##
## COM_K_RATIO COM_L_RATIO COM_M_RATIO COM_N_RATIO
## Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.01144 1st Qu.:0.01802
## Median :0.000000 Median :0.000000 Median :0.02362 Median :0.02924
## Mean :0.003933 Mean :0.005450 Mean :0.02536 Mean :0.03553
## 3rd Qu.:0.006112 3rd Qu.:0.008601 3rd Qu.:0.03659 3rd Qu.:0.04496
## Max. :0.087912 Max. :0.156863 Max. :0.24444 Max. :0.33527
##
## COM_O_RATIO COM_P_RATIO COM_Q_RATIO COM_R_RATIO
## Min. :0.0001764 Min. :0.00000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0058954 1st Qu.:0.01786 1st Qu.:0.006615 1st Qu.:0.000000
## Median :0.0153846 Median :0.02985 Median :0.019946 Median :0.009091
## Mean :0.0277867 Mean :0.04350 Mean :0.022028 Mean :0.010772
## 3rd Qu.:0.0361664 3rd Qu.:0.04878 3rd Qu.:0.033033 3rd Qu.:0.015310
## Max. :0.3636364 Max. :0.83673 Max. :0.214286 Max. :0.166667
##
## COM_S_RATIO COM_U_RATIO POP_DENSITY
## Min. :0.00000 Min. :0.000e+00 Min. : 0.084
## 1st Qu.:0.04116 1st Qu.:0.000e+00 1st Qu.: 11.939
## Median :0.06395 Median :0.000e+00 Median : 25.306
## Mean :0.08933 Mean :2.036e-06 Mean : 117.271
## 3rd Qu.:0.11147 3rd Qu.:0.000e+00 3rd Qu.: 55.443
## Max. :0.56716 Max. :2.985e-03 Max. :13533.497
##
Data Looks good, though we should pay attention to the ratios which have a max-value less than 1. It would be prudent not to normalize them.
Brazil_cities_Derived[73:74]%>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Brazil_cities_Derived[75:76] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Brazil_cities_Derived[77:82] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Brazil_cities_Derived[83:86] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Brazil_cities_Derived[87:106] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Brazil_cities.sf <- st_as_sf(Brazil_cities_Derived,
coords = c("LONG", "LAT"),
crs=4326) %>%
st_transform(crs=4674)
head(Brazil_cities.sf)
## Simple feature collection with 6 features and 104 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -49.44055 ymin: -19.15585 xmax: -39.04755 ymax: -1.72347
## geographic CRS: SIRGAS 2000
## CITY_STATE CITY STATE CAPITAL IBGE_RES_POP
## 1 Abadia De Goiás_GO Abadia De Goiás GO 0 6876
## 2 Abadia Dos Dourados_MG Abadia Dos Dourados MG 0 6704
## 3 Abadiânia_GO Abadiânia GO 0 15757
## 4 Abaeté_MG Abaeté MG 0 22690
## 5 Abaetetuba_PA Abaetetuba PA 0 141100
## 6 Abaiara_CE Abaiara CE 0 10496
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1 6876 0 2137 1546 591
## 2 6704 0 2328 1481 847
## 3 15609 148 4655 3233 1422
## 4 22690 0 7694 6667 1027
## 5 141040 60 31061 19057 12004
## 6 10496 0 2791 1251 1540
## IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 1 5300 69 318 438 517 3542 416
## 2 4154 38 207 260 351 2709 589
## 3 10656 139 650 894 1087 6896 990
## 4 18464 176 856 1233 1539 11979 2681
## 5 82956 1354 5567 7618 8905 53516 5996
## 6 4538 98 323 421 483 2631 582
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao ALT
## 1 1689 0.708 0.687 0.830 0.622 893.60
## 2 2207 0.690 0.693 0.839 0.563 753.12
## 3 2202 0.690 0.671 0.841 0.579 1017.55
## 4 1994 0.698 0.720 0.848 0.556 644.74
## 5 3530 0.628 0.579 0.798 0.537 10.12
## 6 3522 0.628 0.540 0.748 0.612 403.11
## AREA GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## 1 147.26 6.20 27991.25 74750.32 36915.04 139662.81 20554.20
## 2 881.06 50524.57 25917.70 62689.23 28083.79 167215.29 12873.50
## 3 1045.13 42.84 16728.30 138198.58 63396.20 218365.92 26822.58
## 4 1817.07 113824.60 31002.62 172.33 86081.41 231080.96 26994.09
## 5 1610.65 140463.72 58610.00 468128.69 486872.40 1154074.81 95180.48
## 6 180.08 4435.16 5.88 22.81 35989.96 40453.81 4042.79
## GDP POP_GDP GDP_CAPITA COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E
## 1 166.41 8053 20664.57 284 5 1 56 0 2
## 2 180.09 7037 25591.70 476 6 6 30 1 2
## 3 287984.49 18427 15628.40 288 5 9 26 0 2
## 4 430235.36 23574 18250.42 621 18 1 40 0 1
## 5 1249255.29 151934 8222.36 931 4 2 43 0 1
## 6 73151.46 11483 6370.41 86 1 0 4 0 0
## COMP_F COMP_G COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P
## 1 29 110 26 4 5 0 2 10 12 4 6
## 2 34 190 70 28 11 0 4 15 29 2 9
## 3 7 117 12 57 2 1 0 7 15 3 11
## 4 20 303 62 30 9 6 4 28 27 2 15
## 5 27 500 16 31 6 1 1 22 16 2 155
## 6 6 48 2 10 2 0 0 2 3 2 0
## COMP_Q COMP_R COMP_S COMP_U RURAL_URBANIntermediário Adjacente
## 1 6 1 5 0 0
## 2 14 6 19 0 0
## 3 5 1 8 0 0
## 4 19 9 27 0 0
## 5 33 15 56 0 0
## 6 2 0 4 0 0
## RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
## 1 0 0
## 2 0 1
## 3 0 1
## 4 0 0
## 5 0 0
## 6 0 1
## RURAL_URBANRural Remoto RURAL_URBANUrbano
## 1 0 1
## 2 0 0
## 3 0 0
## 4 0 1
## 5 0 1
## 6 0 0
## GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
## 1 0
## 2 0
## 3 0
## 4 0
## 5 1
## 6 1
## GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## GVA_MAINComércio e reparação de veículos automotores e motocicletas
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## GVA_MAINConstrução GVA_MAINDemais serviços
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 0
## 6 0 0
## GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## GVA_MAINPecuária, inclusive apoio à pecuária
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## GVA_MAINProdução florestal, pesca e aquicultura RES_BRAZ_POP_RATIO
## 1 0 1.0000000
## 2 0 1.0000000
## 3 0 0.9906073
## 4 0 1.0000000
## 5 0 0.9995748
## 6 0 1.0000000
## RES_FOREIGN_POP_RATIO DOM_URBAN_RATIO DOM_RURAL_RATIO POP_BEL_ONE_RATIO
## 1 0.0000000000 0.7234441 0.2765559 0.013018868
## 2 0.0000000000 0.6361684 0.3638316 0.009147809
## 3 0.0093926509 0.6945220 0.3054780 0.013044294
## 4 0.0000000000 0.8665194 0.1334806 0.009532062
## 5 0.0004252303 0.6135347 0.3864653 0.016321906
## 6 0.0000000000 0.4482264 0.5517736 0.021595416
## POP_ONE_to_FOUR_RATIO POP_FIVE_to_NINE_RATIO POP_TEN_to_FOURTEEN_RATIO
## 1 0.06000000 0.08264151 0.09754717
## 2 0.04983149 0.06259027 0.08449687
## 3 0.06099850 0.08389640 0.10200826
## 4 0.04636049 0.06677860 0.08335139
## 5 0.06710786 0.09183181 0.10734606
## 6 0.07117673 0.09277215 0.10643455
## POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO
## 1 0.6683019 0.07849057 4.439263e-05 0.200420212
## 2 0.6521425 0.14179104 3.021528e-01 0.154995993
## 3 0.6471471 0.09290541 1.961845e-04 0.076606734
## 4 0.6487760 0.14520147 4.925746e-01 0.134163455
## 5 0.6451131 0.07227928 1.217111e-01 0.050785269
## 6 0.5797708 0.12825033 1.096352e-01 0.000145351
## GVA_SERVICES_RATIO GVA_PUBLIC_RATIO COM_A_RATIO COM_B_RATIO COM_C_RATIO
## 1 0.5352199344 0.2643155 0.017605634 0.003521127 0.19718310
## 2 0.3749013024 0.1679499 0.012605042 0.012605042 0.06302521
## 3 0.6328761374 0.2903209 0.017361111 0.031250000 0.09027778
## 4 0.0007457559 0.3725162 0.028985507 0.001610306 0.06441224
## 5 0.4056311479 0.4218725 0.004296455 0.002148228 0.04618690
## 6 0.0005638529 0.8896556 0.011627907 0.000000000 0.04651163
## COM_D_RATIO COM_E_RATIO COM_F_RATIO COM_G_RATIO COM_H_RATIO COM_I_RATIO
## 1 0.00000000 0.007042254 0.10211268 0.3873239 0.09154930 0.01408451
## 2 0.00210084 0.004201681 0.07142857 0.3991597 0.14705882 0.05882353
## 3 0.00000000 0.006944444 0.02430556 0.4062500 0.04166667 0.19791667
## 4 0.00000000 0.001610306 0.03220612 0.4879227 0.09983897 0.04830918
## 5 0.00000000 0.001074114 0.02900107 0.5370569 0.01718582 0.03329753
## 6 0.00000000 0.000000000 0.06976744 0.5581395 0.02325581 0.11627907
## COM_J_RATIO COM_K_RATIO COM_L_RATIO COM_M_RATIO COM_N_RATIO COM_O_RATIO
## 1 0.017605634 0.000000000 0.007042254 0.03521127 0.04225352 0.014084507
## 2 0.023109244 0.000000000 0.008403361 0.03151261 0.06092437 0.004201681
## 3 0.006944444 0.003472222 0.000000000 0.02430556 0.05208333 0.010416667
## 4 0.014492754 0.009661836 0.006441224 0.04508857 0.04347826 0.003220612
## 5 0.006444683 0.001074114 0.001074114 0.02363050 0.01718582 0.002148228
## 6 0.023255814 0.000000000 0.000000000 0.02325581 0.03488372 0.023255814
## COM_P_RATIO COM_Q_RATIO COM_R_RATIO COM_S_RATIO COM_U_RATIO POP_DENSITY
## 1 0.02112676 0.02112676 0.003521127 0.01760563 0 54.68559
## 2 0.01890756 0.02941176 0.012605042 0.03991597 0 7.98697
## 3 0.03819444 0.01736111 0.003472222 0.02777778 0 17.63130
## 4 0.02415459 0.03059581 0.014492754 0.04347826 0 12.97363
## 5 0.16648765 0.03544576 0.016111708 0.06015038 0 94.33086
## 6 0.00000000 0.02325581 0.000000000 0.04651163 0 63.76610
## geometry
## 1 POINT (-49.44055 -16.75881)
## 2 POINT (-47.39683 -18.48756)
## 3 POINT (-48.71881 -16.18267)
## 4 POINT (-45.44619 -19.15585)
## 5 POINT (-48.8844 -1.72347)
## 6 POINT (-39.04755 -7.356977)
We will be changing the CRS to 4674 as per the geobr documentation in order to accurately map the datapoints to the Brazil country map for the municipalities.
Validity_NA_Check(Brazil_cities.sf)
## [1] "For: Brazil_cities.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"
#muni.sf <- read_municipality(year=2010)
We will be loading in the municipalities from 2010 in order to ensure that our data to align with the lat long data from our aspatial dataset which specifies the date as 2010. Additionally this will be commented out as we will save the data locally after cleaning to reduce processing time of the file.
#Validity_NA_Check(muni.sf)
#muni.sf <- st_make_valid(muni.sf)
#Validity_NA_Check(muni.sf)
#muni.sp <- as_Spatial(muni.sf)
#writeOGR(muni.sp, "./data/geospatial", "Brazil_Muni", driver="ESRI Shapefile")
The ablove were commented out to reduce loading times. We will load in the file locally and check the validity.
tmap_mode("plot")
muni_loaded.sf <- st_read(dsn="data/geospatial", layer="Brazil_Muni")
## Reading layer `Brazil_Muni' from data source `D:\GSA\Take_Home_EX04\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 5567 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75208 xmax: -28.83609 ymax: 5.271841
## geographic CRS: GRS 1980(IUGG, 1980)
st_crs(muni_loaded.sf) <- 4674
qtm(muni_loaded.sf)
Validity_NA_Check(muni_loaded.sf)
## [1] "For: muni_loaded.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"
muni_loaded_w_unique.sf <- cbind(CITY_STATE_M = paste(muni_loaded.sf$name_mn, muni_loaded.sf$abbrv_s, sep="_"), muni_loaded.sf)
tm_shape(muni_loaded_w_unique.sf)+
tm_fill(col= "code_mn")+
tm_shape(Brazil_cities.sf)+
tm_dots(size = 0.01)
Based on the map above, we can observe the points are accurately mapped to the respective municipalities in Brazil We will create a combined dataframe to allow us to perform our next phase of choropleth mapping.
#Brazil_cities.sf <- Brazil_cities.sf[!(Brazil_cities.sf$CITY_STATE =="Fernando De Noronha_PE"), ]
tmap_mode("plot")
Brazil_super.sf <- st_join(muni_loaded_w_unique.sf, Brazil_cities.sf, join=st_intersects)
Validity_NA_Check(Brazil_super.sf)
## [1] "For: Brazil_super.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 3"
Checking NA Row locations
temp_NA <- Brazil_super.sf[rowSums(is.na(Brazil_super.sf))!=0,]
as.character(temp_NA$name_mn)
## [1] "Santa Teresinha" "Lagoa Mirim" "Lagoa Dos Patos"
Based on the data above, we can see that 2 of the polygons with NA are lakes and the last one is Santa Teresinha which we removed because of missing values in the data cleaning. This means that the rest of the polygons should have the data mapped to them correctly, unless there are double points in them.
Removing NA rows
Brazil_super_cleaned.sf<- Brazil_super.sf[rowSums(is.na(Brazil_super.sf))==0,]
Validity_NA_Check(Brazil_super_cleaned.sf)
## [1] "For: Brazil_super_cleaned.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"
Checking for duplicates
dim(Brazil_super_cleaned.sf[duplicated(Brazil_super_cleaned.sf$CITY_STATE.x),])
## [1] 0 110
Seems there are no duplicate rows. Which means that each polygon has only one data point attached to it.
tmap_mode("plot")
tm_shape(Brazil_super_cleaned.sf)+
tm_fill(col= "GDP_CAPITA",
style="jenks",
title = "GDP per Capita",
palette ="Greens")+
tm_layout(main.title = "Distribution of GDP per Capita by Municipality \n(Jenks classification)",
main.title.position = "center",
main.title.size = 1,
legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) +
tm_borders(alpha = 0.1)
Based on the map above. We can see a surprising result in our mapping for GDP per Capita. It appears that the highest GDP per capita are around the satelight cities around Sao Paulo rather than the main city itself. Additionally, very far inland in areas like Selviria and Campos De Júlio, we can also see concentrations of higher GDP per capita. This could be due to a lower population while the region is still generating a large amount of production. This is surpising given the larger areas of these polygons.
What is even more suprising is that the two main cities in Brazil of Rio De Janeiro and Sao Paolo only have GDP per capita of 50,690 and 57,071 respectively. This is most likely due to a much larger population count concentrated in these smaller areas which is concerning from a social development standpoint.
dropsAbrev <- c("CITY_STATE_M", "code_mn", "name_mn", "cod_stt", "abbrv_s", "CITY", "STATE")
Brazil_reg.sf <- Brazil_super_cleaned.sf[ , !(names(Brazil_super_cleaned.sf) %in% dropsAbrev)]
Brazil_numeric_vars <- cbind(Brazil_reg.sf[,3:28]%>%
st_set_geometry(NULL), Brazil_reg.sf[,32:52]%>%
st_set_geometry(NULL), Brazil_reg.sf[,102]%>%
st_set_geometry(NULL))
Brazil_numeric_vars.norm <- normalize(Brazil_numeric_vars)
Brazil_Ratios_vars <- Brazil_reg.sf[,68:101] %>%
st_set_geometry(NULL)
Brazil_Categorical_vars <- cbind(Brazil_reg.sf[,2]%>%
st_set_geometry(NULL), Brazil_reg.sf[,53:67]%>%
st_set_geometry(NULL))
dropsReg <- c("CITY_STATE", "GDP", "GDP_CAPITA", "POP_GDP")
Brazil_All_vars <- Brazil_reg.sf[ , !(names(Brazil_reg.sf) %in% dropsReg)] %>%
st_set_geometry(NULL)
corrplot(cor(Brazil_numeric_vars.norm, use = "complete.obs"), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "square", type = "upper")
corrplot(cor(Brazil_Ratios_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")
corrplot(cor(Brazil_Categorical_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "square", type = "upper")
# Removed all variables for display reasons. Although they were checked in the analysis to ensure all variables don't correlate too much
# corrplot(cor(Brazil_All_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
#tl.pos = "td", tl.cex = 0.5, method = "sqaure", type = "upper")
As expected, there are a number of indicators from our numeric dataset that are clearly highyl correlated with one another, noticaply the IBGE, GVA, TAXES and COMP numbers. Because of their correlation with COMP_TOT, we will use that as a metric to capture all those numbers as it is the likely contributor to those variables arizing (particularly taxes). We will also use IDHM as a measure for all the IDHM indicators specified although there will be some loss of information.
Within Ratios, we can see the amongst the population ratios the youths are very highly correlated. As these are ratios, we can sum them up to give us a new Youth metric instead. Additionally because DOM_RURAL_RATIO, DOM_URBAN_RATIO and RES_BRAZ_POP_RATIO, RES_FOREIGN_POP_RATIO are polar opposites, we can just take one to use as an indicator. In our case, we will choose the Foreign Population ratio and the Domestic Urban Units ratis.
Brazil_numeric_vars_pro <- Brazil_numeric_vars.norm %>% select("ALT", "AREA", "IDHM", "POP_DENSITY", "COMP_TOT")
Brazil_Ratios_vars_pro <- Brazil_Ratios_vars %>%
mutate( POP_YOUTH_RATIO = as.numeric((POP_BEL_ONE_RATIO + POP_ONE_to_FOUR_RATIO + POP_FIVE_to_NINE_RATIO + POP_TEN_to_FOURTEEN_RATIO)))
dropsRatios <- c("POP_BEL_ONE_RATIO", "POP_ONE_to_FOUR_RATIO", "POP_FIVE_to_NINE_RATIO", "POP_TEN_to_FOURTEEN_RATIO", "RES_BRAZ_POP_RATIO", "DOM_RURAL_RATIO")
Brazil_Ratios_vars_pro <- Brazil_Ratios_vars_pro[ , !(names(Brazil_Ratios_vars_pro) %in% dropsRatios)]
Brazil_indicators <- cbind(Brazil_Ratios_vars_pro, Brazil_Categorical_vars, Brazil_numeric_vars_pro)
corrplot(cor(Brazil_indicators, use = "complete.obs"), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.4, number.cex= 0.3, method = "number", type = "upper")
Based on our correlational plot, we dont see any variables which are heavily correlated beyond 0.75. As such, we will take these variables to be those we utilize in our regression.
polygon_frame <- Brazil_reg.sf %>% select("CITY_STATE")
joining_frame <- Brazil_reg.sf %>% select("CITY_STATE", "GDP_CAPITA") %>% st_set_geometry(NULL)
joining_frame_states <- cbind(joining_frame, Brazil_indicators)
Brazil_Indicators.sf <- left_join(polygon_frame, joining_frame_states, by="CITY_STATE") ## Usually you would use an index but after checking the data, we find that it does align with the data from Brazil_reg.sf so as such, we can assume the data was actually joint to the original SF
Validity_NA_Check(Brazil_Indicators.sf)
## [1] "For: Brazil_Indicators.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"
summary(Brazil_Indicators.sf)
## CITY_STATE GDP_CAPITA RES_FOREIGN_POP_RATIO
## Abadia De Goiás_GO : 1 Min. : 3191 Min. :0.0000000
## Abadia Dos Dourados_MG: 1 1st Qu.: 9062 1st Qu.:0.0000000
## Abadiânia_GO : 1 Median : 15870 Median :0.0000000
## Abaeté_MG : 1 Mean : 21122 Mean :0.0007593
## Abaetetuba_PA : 1 3rd Qu.: 26155 3rd Qu.:0.0006992
## Abaiara_CE : 1 Max. :314638 Max. :0.3772182
## (Other) :5558
## DOM_URBAN_RATIO POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO
## Min. :0.04553 Min. :0.4716 Min. :0.02255 Min. :0.00000
## 1st Qu.:0.49148 1st Qu.:0.6087 1st Qu.:0.09799 1st Qu.:0.03364
## Median :0.66263 Median :0.6325 Median :0.11921 Median :0.15062
## Mean :0.65205 Mean :0.6308 Mean :0.12009 Mean :0.21034
## 3rd Qu.:0.83040 3rd Qu.:0.6543 3rd Qu.:0.14103 3rd Qu.:0.34094
## Max. :1.00000 Max. :0.7448 Max. :0.42199 Max. :0.99877
##
## GVA_INDUSTRY_RATIO GVA_SERVICES_RATIO GVA_PUBLIC_RATIO COM_A_RATIO
## Min. :0.0000157 Min. :0.0000461 Min. :0.0000433 Min. :0.000000
## 1st Qu.:0.0368730 1st Qu.:0.1985910 1st Qu.:0.1448472 1st Qu.:0.001569
## Median :0.0714602 Median :0.3117002 Median :0.2948082 Median :0.011803
## Mean :0.1377745 Mean :0.3260963 Mean :0.3257928 Mean :0.039408
## 3rd Qu.:0.1795132 3rd Qu.:0.4600063 3rd Qu.:0.4966551 3rd Qu.:0.031915
## Max. :0.9991868 Max. :0.9995977 Max. :0.9996029 Max. :0.917085
##
## COM_B_RATIO COM_C_RATIO COM_D_RATIO COM_E_RATIO
## Min. :0.000000 Min. :0.00000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.03636 1st Qu.:0.0000000 1st Qu.:0.000000
## Median :0.000000 Median :0.06590 Median :0.0000000 Median :0.000000
## Mean :0.006019 Mean :0.07967 Mean :0.0007847 Mean :0.002508
## 3rd Qu.:0.005188 3rd Qu.:0.10593 3rd Qu.:0.0000000 3rd Qu.:0.003226
## Max. :0.333333 Max. :0.54518 Max. :0.4444444 Max. :0.083333
##
## COM_F_RATIO COM_G_RATIO COM_H_RATIO COM_I_RATIO
## Min. :0.00000 Min. :0.01789 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.01389 1st Qu.:0.38980 1st Qu.:0.01562 1st Qu.:0.02128
## Median :0.02778 Median :0.46396 Median :0.03757 Median :0.04167
## Mean :0.03130 Mean :0.47234 Mean :0.04955 Mean :0.04567
## 3rd Qu.:0.04348 3rd Qu.:0.55263 3rd Qu.:0.07052 3rd Qu.:0.06202
## Max. :0.29213 Max. :0.89091 Max. :0.43689 Max. :0.52542
##
## COM_J_RATIO COM_K_RATIO COM_L_RATIO COM_M_RATIO
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.01144
## Median :0.007299 Median :0.000000 Median :0.000000 Median :0.02362
## Mean :0.009054 Mean :0.003933 Mean :0.005450 Mean :0.02536
## 3rd Qu.:0.013982 3rd Qu.:0.006112 3rd Qu.:0.008601 3rd Qu.:0.03659
## Max. :0.417249 Max. :0.087912 Max. :0.156863 Max. :0.24444
##
## COM_N_RATIO COM_O_RATIO COM_P_RATIO COM_Q_RATIO
## Min. :0.00000 Min. :0.0001764 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.01802 1st Qu.:0.0058954 1st Qu.:0.01786 1st Qu.:0.006615
## Median :0.02924 Median :0.0153846 Median :0.02985 Median :0.019946
## Mean :0.03553 Mean :0.0277867 Mean :0.04350 Mean :0.022028
## 3rd Qu.:0.04496 3rd Qu.:0.0361664 3rd Qu.:0.04878 3rd Qu.:0.033033
## Max. :0.33527 Max. :0.3636364 Max. :0.83673 Max. :0.214286
##
## COM_R_RATIO COM_S_RATIO COM_U_RATIO POP_YOUTH_RATIO
## Min. :0.000000 Min. :0.00000 Min. :0.000e+00 Min. :0.1064
## 1st Qu.:0.000000 1st Qu.:0.04116 1st Qu.:0.000e+00 1st Qu.:0.2153
## Median :0.009091 Median :0.06395 Median :0.000e+00 Median :0.2452
## Mean :0.010772 Mean :0.08933 Mean :2.036e-06 Mean :0.2491
## 3rd Qu.:0.015310 3rd Qu.:0.11147 3rd Qu.:0.000e+00 3rd Qu.:0.2771
## Max. :0.166667 Max. :0.56716 Max. :2.985e-03 Max. :0.4408
##
## CAPITAL RURAL_URBANIntermediário Adjacente
## Min. :0.000000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.0000
## Median :0.000000 Median :0.0000
## Mean :0.004853 Mean :0.1233
## 3rd Qu.:0.000000 3rd Qu.:0.0000
## Max. :1.000000 Max. :1.0000
##
## RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :1.0000
## Mean :0.01078 Mean :0.5462
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000
##
## RURAL_URBANRural Remoto RURAL_URBANUrbano
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000
## Mean :0.05805 Mean :0.2617
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000
##
## GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4892
## 3rd Qu.:1.0000
## Max. :1.0000
##
## GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1317
## 3rd Qu.:0.0000
## Max. :1.0000
##
## GVA_MAINComércio e reparação de veículos automotores e motocicletas
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.000000
## Mean :0.008267
## 3rd Qu.:0.000000
## Max. :1.000000
##
## GVA_MAINConstrução GVA_MAINDemais serviços
## Min. :0.000000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.0000
## Median :0.000000 Median :0.0000
## Mean :0.001258 Mean :0.2653
## 3rd Qu.:0.000000 3rd Qu.:1.0000
## Max. :1.000000 Max. :1.0000
##
## GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.01761
## 3rd Qu.:0.00000
## Max. :1.00000
##
## GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000
## Mean :0.04691 Mean :0.00629
## 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000
##
## GVA_MAINPecuária, inclusive apoio à pecuária
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.02894
## 3rd Qu.:0.00000
## Max. :1.00000
##
## GVA_MAINProdução florestal, pesca e aquicultura ALT
## Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.0001941
## Median :0.000000 Median :0.0004648
## Mean :0.004493 Mean :0.0010222
## 3rd Qu.:0.000000 3rd Qu.:0.0007193
## Max. :1.000000 Max. :1.0000000
##
## AREA IDHM POP_DENSITY COMP_TOT
## Min. :0.000000 Min. :0.0000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.001260 1st Qu.:0.4077 1st Qu.:0.000876 1st Qu.:0.0001169
## Median :0.002589 Median :0.5563 Median :0.001864 Median :0.0002941
## Mean :0.009539 Mean :0.5432 Mean :0.008659 Mean :0.0016997
## 3rd Qu.:0.006412 3rd Qu.:0.6757 3rd Qu.:0.004091 3rd Qu.:0.0008356
## Max. :1.000000 Max. :1.0000 Max. :1.000000 Max. :1.0000000
##
## geometry
## MULTIPOLYGON :5564
## epsg:4674 : 0
## +proj=long...: 0
##
##
##
##
When performing a multi-linear regression, we need to define our Null Hypothesis: * NULL Hypothesis: The data is randomly distributed * Alternative Hypothesis: The data is not randomly distributed
We will be selecting a confidence level of 95% for this analysis. Meaning we would need an alpha value below 0.05 in order to reject the null hypothesis
Because we have Categorical data and data which sums to 1, we will need to decide which one of the following is our baseline:
GDPPC.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_Indicators.sf[2:53] %>% st_set_geometry(NULL))
summary(GDPPC.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_Indicators.sf[2:53] %>%
## st_set_geometry(NULL))
##
## Residuals:
## Min 1Q Median 3Q Max
## -40924 -5198 -713 3256 246925
##
## Coefficients: (5 not defined because of singularities)
## Estimate
## (Intercept) 5457565.3
## RES_FOREIGN_POP_RATIO -5490.9
## DOM_URBAN_RATIO -1953.4
## POP_WORKING_RATIO 37961.8
## POP_ELDERLY_RATIO -34416.2
## GVA_AGROPEC_RATIO 8195.2
## GVA_INDUSTRY_RATIO 22856.5
## GVA_SERVICES_RATIO 4935.0
## GVA_PUBLIC_RATIO NA
## COM_A_RATIO -5474832.5
## COM_B_RATIO -5510719.9
## COM_C_RATIO -5504450.0
## COM_D_RATIO -5436381.1
## COM_E_RATIO -5436230.7
## COM_F_RATIO -5477177.6
## COM_G_RATIO -5479871.4
## COM_H_RATIO -5458575.7
## COM_I_RATIO -5467005.8
## COM_J_RATIO -5467998.6
## COM_K_RATIO -5364519.1
## COM_L_RATIO -5373763.3
## COM_M_RATIO -5460390.8
## COM_N_RATIO -5466424.3
## COM_O_RATIO -5461544.0
## COM_P_RATIO -5478092.3
## COM_Q_RATIO -5487459.3
## COM_R_RATIO -5479679.6
## COM_S_RATIO -5477920.3
## COM_U_RATIO NA
## POP_YOUTH_RATIO NA
## CAPITAL -8485.0
## `RURAL_URBANIntermediário Adjacente` -370.4
## `RURAL_URBANIntermediário Remoto` 4839.7
## `RURAL_URBANRural Adjacente` 1555.1
## `RURAL_URBANRural Remoto` 4727.9
## RURAL_URBANUrbano NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` -9635.6
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita` 2708.8
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 21939.3
## GVA_MAINConstrução -7020.6
## `GVA_MAINDemais serviços` -8217.5
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 19968.9
## `GVA_MAINIndústrias de transformação` 13099.6
## `GVA_MAINIndústrias extrativas` 14549.4
## `GVA_MAINPecuária, inclusive apoio à pecuária` -5565.2
## `GVA_MAINProdução florestal, pesca e aquicultura` NA
## ALT -5481.8
## AREA 8959.2
## IDHM 36695.3
## POP_DENSITY 7831.5
## COMP_TOT 36690.6
## Std. Error
## (Intercept) 5758288.1
## RES_FOREIGN_POP_RATIO 56651.7
## DOM_URBAN_RATIO 1462.6
## POP_WORKING_RATIO 10630.0
## POP_ELDERLY_RATIO 7766.5
## GVA_AGROPEC_RATIO 1309.1
## GVA_INDUSTRY_RATIO 1634.1
## GVA_SERVICES_RATIO 1161.4
## GVA_PUBLIC_RATIO NA
## COM_A_RATIO 5758215.0
## COM_B_RATIO 5758386.0
## COM_C_RATIO 5758212.1
## COM_D_RATIO 5758474.6
## COM_E_RATIO 5758512.8
## COM_F_RATIO 5758246.2
## COM_G_RATIO 5758265.9
## COM_H_RATIO 5758277.5
## COM_I_RATIO 5757930.3
## COM_J_RATIO 5758076.2
## COM_K_RATIO 5757653.4
## COM_L_RATIO 5757834.4
## COM_M_RATIO 5758161.2
## COM_N_RATIO 5757990.4
## COM_O_RATIO 5758278.8
## COM_P_RATIO 5758244.6
## COM_Q_RATIO 5758228.4
## COM_R_RATIO 5758306.1
## COM_S_RATIO 5758244.3
## COM_U_RATIO NA
## POP_YOUTH_RATIO NA
## CAPITAL 3355.8
## `RURAL_URBANIntermediário Adjacente` 738.4
## `RURAL_URBANIntermediário Remoto` 2083.3
## `RURAL_URBANRural Adjacente` 695.9
## `RURAL_URBANRural Remoto` 1085.4
## RURAL_URBANUrbano NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` 2983.5
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita` 2991.4
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 3698.9
## GVA_MAINConstrução 6278.1
## `GVA_MAINDemais serviços` 3025.8
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 3389.8
## `GVA_MAINIndústrias de transformação` 3168.8
## `GVA_MAINIndústrias extrativas` 3923.8
## `GVA_MAINPecuária, inclusive apoio à pecuária` 3170.5
## `GVA_MAINProdução florestal, pesca e aquicultura` NA
## ALT 9989.1
## AREA 6234.3
## IDHM 2842.5
## POP_DENSITY 4947.6
## COMP_TOT 14889.5
## t value
## (Intercept) 0.948
## RES_FOREIGN_POP_RATIO -0.097
## DOM_URBAN_RATIO -1.336
## POP_WORKING_RATIO 3.571
## POP_ELDERLY_RATIO -4.431
## GVA_AGROPEC_RATIO 6.260
## GVA_INDUSTRY_RATIO 13.987
## GVA_SERVICES_RATIO 4.249
## GVA_PUBLIC_RATIO NA
## COM_A_RATIO -0.951
## COM_B_RATIO -0.957
## COM_C_RATIO -0.956
## COM_D_RATIO -0.944
## COM_E_RATIO -0.944
## COM_F_RATIO -0.951
## COM_G_RATIO -0.952
## COM_H_RATIO -0.948
## COM_I_RATIO -0.949
## COM_J_RATIO -0.950
## COM_K_RATIO -0.932
## COM_L_RATIO -0.933
## COM_M_RATIO -0.948
## COM_N_RATIO -0.949
## COM_O_RATIO -0.948
## COM_P_RATIO -0.951
## COM_Q_RATIO -0.953
## COM_R_RATIO -0.952
## COM_S_RATIO -0.951
## COM_U_RATIO NA
## POP_YOUTH_RATIO NA
## CAPITAL -2.528
## `RURAL_URBANIntermediário Adjacente` -0.502
## `RURAL_URBANIntermediário Remoto` 2.323
## `RURAL_URBANRural Adjacente` 2.235
## `RURAL_URBANRural Remoto` 4.356
## RURAL_URBANUrbano NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` -3.230
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita` 0.906
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 5.931
## GVA_MAINConstrução -1.118
## `GVA_MAINDemais serviços` -2.716
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 5.891
## `GVA_MAINIndústrias de transformação` 4.134
## `GVA_MAINIndústrias extrativas` 3.708
## `GVA_MAINPecuária, inclusive apoio à pecuária` -1.755
## `GVA_MAINProdução florestal, pesca e aquicultura` NA
## ALT -0.549
## AREA 1.437
## IDHM 12.910
## POP_DENSITY 1.583
## COMP_TOT 2.464
## Pr(>|t|)
## (Intercept) 0.343285
## RES_FOREIGN_POP_RATIO 0.922791
## DOM_URBAN_RATIO 0.181731
## POP_WORKING_RATIO 0.000358
## POP_ELDERLY_RATIO 9.55e-06
## GVA_AGROPEC_RATIO 4.14e-10
## GVA_INDUSTRY_RATIO < 2e-16
## GVA_SERVICES_RATIO 2.18e-05
## GVA_PUBLIC_RATIO NA
## COM_A_RATIO 0.341754
## COM_B_RATIO 0.338614
## COM_C_RATIO 0.339149
## COM_D_RATIO 0.345177
## COM_E_RATIO 0.345194
## COM_F_RATIO 0.341550
## COM_G_RATIO 0.341315
## COM_H_RATIO 0.343195
## COM_I_RATIO 0.342421
## COM_J_RATIO 0.342346
## COM_K_RATIO 0.351522
## COM_L_RATIO 0.350708
## COM_M_RATIO 0.343025
## COM_N_RATIO 0.342477
## COM_O_RATIO 0.342933
## COM_P_RATIO 0.341470
## COM_Q_RATIO 0.340643
## COM_R_RATIO 0.341335
## COM_S_RATIO 0.341485
## COM_U_RATIO NA
## POP_YOUTH_RATIO NA
## CAPITAL 0.011484
## `RURAL_URBANIntermediário Adjacente` 0.615964
## `RURAL_URBANIntermediário Remoto` 0.020211
## `RURAL_URBANRural Adjacente` 0.025484
## `RURAL_URBANRural Remoto` 1.35e-05
## RURAL_URBANUrbano NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` 0.001247
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita` 0.365227
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 3.19e-09
## GVA_MAINConstrução 0.263506
## `GVA_MAINDemais serviços` 0.006632
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 4.07e-09
## `GVA_MAINIndústrias de transformação` 3.62e-05
## `GVA_MAINIndústrias extrativas` 0.000211
## `GVA_MAINPecuária, inclusive apoio à pecuária` 0.079260
## `GVA_MAINProdução florestal, pesca e aquicultura` NA
## ALT 0.583182
## AREA 0.150749
## IDHM < 2e-16
## POP_DENSITY 0.113508
## COMP_TOT 0.013763
##
## (Intercept)
## RES_FOREIGN_POP_RATIO
## DOM_URBAN_RATIO
## POP_WORKING_RATIO ***
## POP_ELDERLY_RATIO ***
## GVA_AGROPEC_RATIO ***
## GVA_INDUSTRY_RATIO ***
## GVA_SERVICES_RATIO ***
## GVA_PUBLIC_RATIO
## COM_A_RATIO
## COM_B_RATIO
## COM_C_RATIO
## COM_D_RATIO
## COM_E_RATIO
## COM_F_RATIO
## COM_G_RATIO
## COM_H_RATIO
## COM_I_RATIO
## COM_J_RATIO
## COM_K_RATIO
## COM_L_RATIO
## COM_M_RATIO
## COM_N_RATIO
## COM_O_RATIO
## COM_P_RATIO
## COM_Q_RATIO
## COM_R_RATIO
## COM_S_RATIO
## COM_U_RATIO
## POP_YOUTH_RATIO
## CAPITAL *
## `RURAL_URBANIntermediário Adjacente`
## `RURAL_URBANIntermediário Remoto` *
## `RURAL_URBANRural Adjacente` *
## `RURAL_URBANRural Remoto` ***
## RURAL_URBANUrbano
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` **
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` ***
## GVA_MAINConstrução
## `GVA_MAINDemais serviços` **
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` ***
## `GVA_MAINIndústrias de transformação` ***
## `GVA_MAINIndústrias extrativas` ***
## `GVA_MAINPecuária, inclusive apoio à pecuária` .
## `GVA_MAINProdução florestal, pesca e aquicultura`
## ALT
## AREA
## IDHM ***
## POP_DENSITY
## COMP_TOT *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14600 on 5518 degrees of freedom
## Multiple R-squared: 0.488, Adjusted R-squared: 0.4838
## F-statistic: 116.9 on 45 and 5518 DF, p-value: < 2.2e-16
Based on the F-statistic, it seems our model has a p-value less than 0.05 which means that the goodness of fit for the model is significant to reject the null hypothesis which is that the rate of change in the dependent variable is explainable by the mean.
It would seem that the company type ratios do not contribute signifcantly to GDP per Capita. Addtionally, the altitude and size of the municipality also show not significance. The same is seen for population density, ratio of foreigners in the population and percentage of urbanized households. There are some GVA main categories which are also not statistically significant which we will remove. Lastly the Urban or Rural classifications seem to have some significance except for Intermediário Remoto which is likely because the definition is very inbetween many of the othse.
Brazil_sig_Indic.sf <- Brazil_Indicators.sf %>% select("CITY_STATE", "GDP_CAPITA", "POP_WORKING_RATIO", "POP_ELDERLY_RATIO","GVA_AGROPEC_RATIO", "GVA_INDUSTRY_RATIO", "GVA_SERVICES_RATIO", "CAPITAL", "RURAL_URBANIntermediário Adjacente", "RURAL_URBANIntermediário Remoto", "RURAL_URBANRural Adjacente", "RURAL_URBANRural Remoto", "GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social", "GVA_MAINComércio e reparação de veículos automotores e motocicletas", "GVA_MAINDemais serviços", "GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação", "GVA_MAINIndústrias de transformação", "GVA_MAINIndústrias extrativas", "IDHM", "COMP_TOT")
GDPPC_sig.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_sig_Indic.sf[2:21] %>% st_set_geometry(NULL))
summary(GDPPC_sig.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:21] %>%
## st_set_geometry(NULL))
##
## Residuals:
## Min 1Q Median 3Q Max
## -42585 -5379 -942 3078 252473
##
## Coefficients:
## Estimate
## (Intercept) -13666.5
## POP_WORKING_RATIO 28125.7
## POP_ELDERLY_RATIO -43172.8
## GVA_AGROPEC_RATIO 8705.5
## GVA_INDUSTRY_RATIO 22762.2
## GVA_SERVICES_RATIO 5337.2
## CAPITAL -4722.8
## `RURAL_URBANIntermediário Adjacente` -1085.7
## `RURAL_URBANIntermediário Remoto` 4783.1
## `RURAL_URBANRural Adjacente` 1129.0
## `RURAL_URBANRural Remoto` 4452.0
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` -11208.5
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 21417.8
## `GVA_MAINDemais serviços` -9095.2
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 19564.9
## `GVA_MAINIndústrias de transformação` 11273.0
## `GVA_MAINIndústrias extrativas` 15435.8
## IDHM 39405.4
## COMP_TOT 55830.7
## Std. Error
## (Intercept) 6122.4
## POP_WORKING_RATIO 10242.9
## POP_ELDERLY_RATIO 7379.9
## GVA_AGROPEC_RATIO 1296.0
## GVA_INDUSTRY_RATIO 1636.9
## GVA_SERVICES_RATIO 1153.6
## CAPITAL 3279.7
## `RURAL_URBANIntermediário Adjacente` 732.7
## `RURAL_URBANIntermediário Remoto` 2012.5
## `RURAL_URBANRural Adjacente` 627.3
## `RURAL_URBANRural Remoto` 1037.5
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` 707.0
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 2295.3
## `GVA_MAINDemais serviços` 774.5
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 1733.3
## `GVA_MAINIndústrias de transformação` 1222.3
## `GVA_MAINIndústrias extrativas` 2652.6
## IDHM 2416.5
## COMP_TOT 14553.1
## t value
## (Intercept) -2.232
## POP_WORKING_RATIO 2.746
## POP_ELDERLY_RATIO -5.850
## GVA_AGROPEC_RATIO 6.717
## GVA_INDUSTRY_RATIO 13.906
## GVA_SERVICES_RATIO 4.627
## CAPITAL -1.440
## `RURAL_URBANIntermediário Adjacente` -1.482
## `RURAL_URBANIntermediário Remoto` 2.377
## `RURAL_URBANRural Adjacente` 1.800
## `RURAL_URBANRural Remoto` 4.291
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` -15.853
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` 9.331
## `GVA_MAINDemais serviços` -11.743
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 11.288
## `GVA_MAINIndústrias de transformação` 9.223
## `GVA_MAINIndústrias extrativas` 5.819
## IDHM 16.307
## COMP_TOT 3.836
## Pr(>|t|)
## (Intercept) 0.025639
## POP_WORKING_RATIO 0.006054
## POP_ELDERLY_RATIO 5.19e-09
## GVA_AGROPEC_RATIO 2.04e-11
## GVA_INDUSTRY_RATIO < 2e-16
## GVA_SERVICES_RATIO 3.80e-06
## CAPITAL 0.149924
## `RURAL_URBANIntermediário Adjacente` 0.138455
## `RURAL_URBANIntermediário Remoto` 0.017506
## `RURAL_URBANRural Adjacente` 0.071941
## `RURAL_URBANRural Remoto` 1.81e-05
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` < 2e-16
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` < 2e-16
## `GVA_MAINDemais serviços` < 2e-16
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` < 2e-16
## `GVA_MAINIndústrias de transformação` < 2e-16
## `GVA_MAINIndústrias extrativas` 6.25e-09
## IDHM < 2e-16
## COMP_TOT 0.000126
##
## (Intercept) *
## POP_WORKING_RATIO **
## POP_ELDERLY_RATIO ***
## GVA_AGROPEC_RATIO ***
## GVA_INDUSTRY_RATIO ***
## GVA_SERVICES_RATIO ***
## CAPITAL
## `RURAL_URBANIntermediário Adjacente`
## `RURAL_URBANIntermediário Remoto` *
## `RURAL_URBANRural Adjacente` .
## `RURAL_URBANRural Remoto` ***
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social` ***
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas` ***
## `GVA_MAINDemais serviços` ***
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` ***
## `GVA_MAINIndústrias de transformação` ***
## `GVA_MAINIndústrias extrativas` ***
## IDHM ***
## COMP_TOT ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14840 on 5545 degrees of freedom
## Multiple R-squared: 0.4685, Adjusted R-squared: 0.4668
## F-statistic: 271.5 on 18 and 5545 DF, p-value: < 2.2e-16
Based on our new regression, we can see some of the variables have become insignificant, Notably the CAPITAL classification and Rural Intermediate or Urban classifications for Adjacente have also become insigifcant. We will run the regression again without them.
dropsInsig <- c("CAPITAL", "RURAL_URBANIntermediário Adjacente", "RURAL_URBANRural Adjacente")
Brazil_sig_Indic.sf <- Brazil_sig_Indic.sf[ , !(names(Brazil_sig_Indic.sf) %in% dropsInsig)]
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'RURAL_URBANIntermediário Remoto'] <- 'CAT_INTERMEDIATE_REMOTE'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'RURAL_URBANRural Remoto'] <- 'CAT_RURAL_REMOTE'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social'] <- 'GVA_MAIN_Public_Sector'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINComércio e reparação de veículos automotores e motocicletas'] <- 'GVA_MAIN_Commercial'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINDemais serviços'] <- 'GVA_MAIN_Other_services'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação'] <- 'GVA_MAIN_Public_Utilities'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINIndústrias de transformação'] <- 'GVA_MAIN_Industry_transformation'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINIndústrias extrativas'] <- 'GVA_MAIN_Industrial'
GDPPC_sig2.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_sig_Indic.sf[2:18] %>% st_set_geometry(NULL))
summary(GDPPC_sig2.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:18] %>%
## st_set_geometry(NULL))
##
## Residuals:
## Min 1Q Median 3Q Max
## -42091 -5367 -884 3055 252671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13922.8 6105.4 -2.280 0.022622 *
## POP_WORKING_RATIO 29840.1 10241.3 2.914 0.003586 **
## POP_ELDERLY_RATIO -37947.9 6994.2 -5.426 6.02e-08 ***
## GVA_AGROPEC_RATIO 9008.2 1287.0 6.999 2.88e-12 ***
## GVA_INDUSTRY_RATIO 22507.7 1633.3 13.780 < 2e-16 ***
## GVA_SERVICES_RATIO 4989.7 1148.6 4.344 1.42e-05 ***
## CAT_INTERMEDIATE_REMOTE 4304.2 1967.3 2.188 0.028725 *
## CAT_RURAL_REMOTE 3815.4 908.6 4.199 2.72e-05 ***
## GVA_MAIN_Public_Sector -11314.6 706.3 -16.019 < 2e-16 ***
## GVA_MAIN_Commercial 21167.5 2293.4 9.230 < 2e-16 ***
## GVA_MAIN_Other_services -9509.9 751.3 -12.657 < 2e-16 ***
## GVA_MAIN_Public_Utilities 19421.8 1734.1 11.200 < 2e-16 ***
## GVA_MAIN_Industry_transformation 11100.7 1220.7 9.094 < 2e-16 ***
## GVA_MAIN_Industrial 15348.7 2654.9 5.781 7.82e-09 ***
## IDHM 38163.3 2351.1 16.232 < 2e-16 ***
## COMP_TOT 46378.4 12894.6 3.597 0.000325 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14850 on 5548 degrees of freedom
## Multiple R-squared: 0.4671, Adjusted R-squared: 0.4657
## F-statistic: 324.2 on 15 and 5548 DF, p-value: < 2.2e-16
Now we can see the final regression, we have an adjusted R-square value of 0.4657 which is quite low which means there the majority of varation in GDP per capita are still unexplained in our model. We’ve seen the Adjusted R-squared value decrease as we continue to refine our model. The F-statistic still shows that the model is still able to reject the null hypothesis that the mean is better at explaining the rate of change in the dependent variable.
As per our regression above which we will validate below, we can see that the variables have a certain impact on GDP per capita. Unsuprisingly, the total number of companies significantly correlates to the GDP per capita. This is most probably due to there being more jobs and therefore more people are able to be employed. Though if we wanted to investigate further, we could examine if the ratio of Companies to Population could have an effect on GDP per capita.
The working population ratio has a positive correlation while the elderly ratio has a negative correlation. This is in line with the logic that the more economically active population percentages contribute to GDP per capita where as the higher dependents in the Elderly results in lower GDP per capita. For our Gross Value Added ratio by industry, it seems most of them contribute positively to GDP per capita, however the Industrial companies seem to contribute greater by a large amount compared to the other two. This is most probably due to the way in which GDP per capita is calculated and amnufacturing sectors contributing more to it than others.
the IDHM which is our Human Development Index seems to also be positively correlated to GDP per capita. However, it is not certain if this is a causal relationship has it might have reverse causality. This is because GDP per capita often leads to greater outcomes in life. But because this data was recorded in 2010 and the GDP per capita is in 2016, we can safetly say that a higher HDI might lead to greater GDP per capita for the people.
In terms of our categorical variables, it seems that being clusified as a Rural or Intermediate Remote region is positively correlated with higher GDP per capita. This sort of matches our choropleth map that showed the inland areas with higher GDP per capita compared to what you would think is more urbanized areas. This could be due to a lower population in these remote areas and more focus on industrial or manufacturing jobs whihc could be contributing to this.
Interestingly the labeling of main sector for Gross Value added shows that areas in which their main sector is Public services such as Public administration, defense, education and health and social security actually correlates less with GDP per Capita. This may be due to municipalities being specialized for certain government functions. Other services also follows the same negative correlation however it is not clear why this is the case. As expected, the places with main economic activities being commercial correlate the most to GDP per capita but suprisingly public utilities such as electricity and gas, water, sewage, waste management and decontamination activities comes in close as well beating out industrial and industrial transformation labelled municipalities.
VIF <- ols_vif_tol(GDPPC_sig2.mlr)
VIF
## Variables Tolerance VIF
## 1 POP_WORKING_RATIO 0.3671431 2.723734
## 2 POP_ELDERLY_RATIO 0.7191852 1.390463
## 3 GVA_AGROPEC_RATIO 0.5633825 1.774993
## 4 GVA_INDUSTRY_RATIO 0.5075373 1.970299
## 5 GVA_SERVICES_RATIO 0.6316755 1.583091
## 6 CAT_INTERMEDIATE_REMOTE 0.9602515 1.041394
## 7 CAT_RURAL_REMOTE 0.8783178 1.138540
## 8 GVA_MAIN_Public_Sector 0.3180169 3.144487
## 9 GVA_MAIN_Commercial 0.9193246 1.087755
## 10 GVA_MAIN_Other_services 0.3603482 2.775094
## 11 GVA_MAIN_Public_Utilities 0.7619520 1.312419
## 12 GVA_MAIN_Industry_transformation 0.5950853 1.680431
## 13 GVA_MAIN_Industrial 0.8998277 1.111324
## 14 IDHM 0.2730812 3.661915
## 15 COMP_TOT 0.9651462 1.036112
As we can see from our VIF analysis, all our variables are non-redundant as cleared by the correlational analysis done earlier.
ols_plot_resid_fit(GDPPC_sig2.mlr)
From the data, we plot above we can see that the data is relatively scattered around the mean. This means that the model passes the linearity assumption required in the multi-linear regression analysis. Additionally, there does not seem to be any obvious signs of heteroscadicity in the plot above.
ols_plot_resid_hist(GDPPC_sig2.mlr)
The figure reveals that the residual of the multiple linear regression model resembles a normal distribution which passes the Normality Assumption. We would normally use ols_test_normality() to further test this assumption. But the function is limtied to sample sizes between 3 to 5000 and we have 5564 observations, thus we will skip this step as we have sufficient evidence from the plot that it passes normality test.
The model we built is using geographically referenced attributes, hence it is also important for us to visualize the residuals of the model in order to rule out spatial autocorrelation.
mlr.output <- as.data.frame(GDPPC_sig2.mlr$residuals)
Brazil_residual.sf <- cbind(Brazil_sig_Indic.sf,
GDPPC_sig2.mlr$residuals) %>%
rename(`MLR_RES` = `GDPPC_sig2.mlr.residuals`)
tmap_mode("plot")
tm_shape(Brazil_residual.sf)+
tm_fill("MLR_RES",
n = 6,
style = "quantile",
palette = "RdYlBu" ) +
tm_borders(alpha = 0.5)
From our mapping of residuals, there isn’t a clear sign on whether or not it is clustered in any way or if theres a geospatial pattern in distribution. However, we can test this using the Moran’s I test.
For this, we will be using the spatial points of the actual municipality since we have them already. We will assume the indexing has no real change as well as we had not done any form of sorting.
Brazil_cities.sp <- as_Spatial(Brazil_cities.sf)
#st_crs(Brazil_cities.sf)
proj4string(Brazil_cities.sp)
## [1] "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"
coords <- coordinates(Brazil_cities.sp)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6029 9.1046 13.1276 17.0081 19.7337 363.0083
nb <- dnearneigh(coordinates(Brazil_cities.sp), 0, 364, longlat = TRUE)
nb_lw <- nb2listw(nb, style = 'W')
lm.morantest(GDPPC_sig2.mlr, nb_lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:18]
## %>% st_set_geometry(NULL))
## weights: nb_lw
##
## Moran I statistic standard deviate = 0.69146, p-value = 0.2446
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 6.391678e-04 -1.799828e-04 1.403444e-06
Based on our global Moran’s I test, we can see that the P-value is above 0.05 which means we are unable to reject the Null hypothesis that the values are randomly distributed. Showing that there is no spatial autocorrelation between the residuals which means that our data is cleared of any spatial autocorrelation in the regression. This allows us to trust the correlations in our model a little better.
We will try to refine our regression using the GWModel
Joint_sf <- left_join(Brazil_cities.sf[,1], Brazil_sig_Indic.sf %>% st_set_geometry(NULL))
Joint_sp <- as_Spatial(Joint_sf)
summary(Joint_sp@data)
## CITY_STATE GDP_CAPITA POP_WORKING_RATIO
## Abadia De Goiás_GO : 1 Min. : 3191 Min. :0.4716
## Abadia Dos Dourados_MG: 1 1st Qu.: 9062 1st Qu.:0.6087
## Abadiânia_GO : 1 Median : 15870 Median :0.6325
## Abaeté_MG : 1 Mean : 21122 Mean :0.6308
## Abaetetuba_PA : 1 3rd Qu.: 26155 3rd Qu.:0.6543
## Abaiara_CE : 1 Max. :314638 Max. :0.7448
## (Other) :5558
## POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO GVA_SERVICES_RATIO
## Min. :0.02255 Min. :0.00000 Min. :0.0000157 Min. :0.0000461
## 1st Qu.:0.09799 1st Qu.:0.03364 1st Qu.:0.0368730 1st Qu.:0.1985910
## Median :0.11921 Median :0.15062 Median :0.0714602 Median :0.3117002
## Mean :0.12009 Mean :0.21034 Mean :0.1377745 Mean :0.3260963
## 3rd Qu.:0.14103 3rd Qu.:0.34094 3rd Qu.:0.1795132 3rd Qu.:0.4600063
## Max. :0.42199 Max. :0.99877 Max. :0.9991868 Max. :0.9995977
##
## CAT_INTERMEDIATE_REMOTE CAT_RURAL_REMOTE GVA_MAIN_Public_Sector
## Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.01078 Mean :0.05805 Mean :0.4892
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.00000 Max. :1.0000
##
## GVA_MAIN_Commercial GVA_MAIN_Other_services GVA_MAIN_Public_Utilities
## Min. :0.000000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.000000 Median :0.0000 Median :0.00000
## Mean :0.008267 Mean :0.2653 Mean :0.01761
## 3rd Qu.:0.000000 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.0000 Max. :1.00000
##
## GVA_MAIN_Industry_transformation GVA_MAIN_Industrial IDHM
## Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.4077
## Median :0.00000 Median :0.00000 Median :0.5563
## Mean :0.04691 Mean :0.00629 Mean :0.5432
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.6757
## Max. :1.00000 Max. :1.00000 Max. :1.0000
##
## COMP_TOT
## Min. :0.0000000
## 1st Qu.:0.0001169
## Median :0.0002941
## Mean :0.0016997
## 3rd Qu.:0.0008356
## Max. :1.0000000
##
##Building Fixed Bandwidth GWR Mode We will be using an Fixed bandwith here due to the varying nature of the polygons in Brazil
#bw.fixed <- bw.gwr(formula = GDP_CAPITA ~ POP_WORKING_RATIO + POP_ELDERLY_RATIO + GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO + CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector + GVA_MAIN_Commercial + GVA_MAIN_Other_services+ GVA_MAIN_Public_Utilities + GVA_MAIN_Industry_transformation + GVA_MAIN_Industrial + IDHM + COMP_TOT, data=Joint_sp, approach= "AIC", kernel="gaussian", adaptive=FALSE, longlat=TRUE)
# Could not resolve the issue
Taking the bandwidth established earlier
gwr.fixed <- gwr.basic(formula = GDP_CAPITA ~ POP_WORKING_RATIO + POP_ELDERLY_RATIO + GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO + CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector + GVA_MAIN_Commercial + GVA_MAIN_Other_services+ GVA_MAIN_Public_Utilities + GVA_MAIN_Industry_transformation + GVA_MAIN_Industrial + IDHM + COMP_TOT, data=Joint_sp, bw=364, kernel = 'gaussian', longlat = TRUE)
gwr.fixed
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-06-01 00:21:15
## Call:
## gwr.basic(formula = GDP_CAPITA ~ POP_WORKING_RATIO + POP_ELDERLY_RATIO +
## GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO +
## CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector +
## GVA_MAIN_Commercial + GVA_MAIN_Other_services + GVA_MAIN_Public_Utilities +
## GVA_MAIN_Industry_transformation + GVA_MAIN_Industrial +
## IDHM + COMP_TOT, data = Joint_sp, bw = 364, kernel = "gaussian",
## longlat = TRUE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO GVA_SERVICES_RATIO CAT_INTERMEDIATE_REMOTE CAT_RURAL_REMOTE GVA_MAIN_Public_Sector GVA_MAIN_Commercial GVA_MAIN_Other_services GVA_MAIN_Public_Utilities GVA_MAIN_Industry_transformation GVA_MAIN_Industrial IDHM COMP_TOT
## Number of data points: 5564
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42091 -5367 -884 3055 252671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13922.8 6105.4 -2.280 0.022622 *
## POP_WORKING_RATIO 29840.1 10241.3 2.914 0.003586 **
## POP_ELDERLY_RATIO -37947.9 6994.2 -5.426 6.02e-08 ***
## GVA_AGROPEC_RATIO 9008.2 1287.0 6.999 2.88e-12 ***
## GVA_INDUSTRY_RATIO 22507.7 1633.3 13.780 < 2e-16 ***
## GVA_SERVICES_RATIO 4989.7 1148.6 4.344 1.42e-05 ***
## CAT_INTERMEDIATE_REMOTE 4304.2 1967.3 2.188 0.028725 *
## CAT_RURAL_REMOTE 3815.4 908.6 4.199 2.72e-05 ***
## GVA_MAIN_Public_Sector -11314.6 706.3 -16.019 < 2e-16 ***
## GVA_MAIN_Commercial 21167.5 2293.4 9.230 < 2e-16 ***
## GVA_MAIN_Other_services -9509.9 751.3 -12.657 < 2e-16 ***
## GVA_MAIN_Public_Utilities 19421.8 1734.1 11.200 < 2e-16 ***
## GVA_MAIN_Industry_transformation 11100.7 1220.7 9.094 < 2e-16 ***
## GVA_MAIN_Industrial 15348.7 2654.9 5.781 7.82e-09 ***
## IDHM 38163.3 2351.1 16.232 < 2e-16 ***
## COMP_TOT 46378.4 12894.6 3.597 0.000325 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 14850 on 5548 degrees of freedom
## Multiple R-squared: 0.4671
## Adjusted R-squared: 0.4657
## F-statistic: 324.2 on 15 and 5548 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 1.223844e+12
## Sigma(hat): 14833.64
## AIC: 122702.5
## AICc: 122702.6
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 364
## Regression points: the same locations as observations are used.
## Distance metric: Great Circle distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -92329.16 -40715.76 -13121.18 6452.30
## POP_WORKING_RATIO -50961.49 9286.58 18723.07 58946.25
## POP_ELDERLY_RATIO -300167.48 -53694.90 -40170.12 -24763.62
## GVA_AGROPEC_RATIO -1419.48 2350.16 8997.05 20989.29
## GVA_INDUSTRY_RATIO -4759.82 10065.73 29567.73 33829.25
## GVA_SERVICES_RATIO -10466.24 1004.91 9619.26 14927.33
## CAT_INTERMEDIATE_REMOTE -2499.98 1220.93 5936.90 19238.03
## CAT_RURAL_REMOTE -2115.10 255.87 2455.88 5966.25
## GVA_MAIN_Public_Sector -18916.73 -10431.07 -8711.67 -8063.23
## GVA_MAIN_Commercial -14577.52 11193.58 16074.03 33082.35
## GVA_MAIN_Other_services -32806.76 -7849.04 -7358.40 -5494.94
## GVA_MAIN_Public_Utilities -21453.80 11024.04 18877.86 28535.35
## GVA_MAIN_Industry_transformation -62782.35 7649.03 13818.96 18882.11
## GVA_MAIN_Industrial -12371.02 5276.13 19584.56 23744.54
## IDHM 1910.09 15644.84 37478.87 48643.06
## COMP_TOT -601785.36 38835.86 57729.70 102521.38
## Max.
## Intercept 26461.8
## POP_WORKING_RATIO 111318.4
## POP_ELDERLY_RATIO 53442.4
## GVA_AGROPEC_RATIO 31812.6
## GVA_INDUSTRY_RATIO 45073.8
## GVA_SERVICES_RATIO 26536.2
## CAT_INTERMEDIATE_REMOTE 40003.4
## CAT_RURAL_REMOTE 15040.3
## GVA_MAIN_Public_Sector 1064.7
## GVA_MAIN_Commercial 56781.7
## GVA_MAIN_Other_services 4793.0
## GVA_MAIN_Public_Utilities 38054.1
## GVA_MAIN_Industry_transformation 27756.8
## GVA_MAIN_Industrial 34343.3
## IDHM 132568.6
## COMP_TOT 1189196.5
## ************************Diagnostic information*************************
## Number of data points: 5564
## Effective number of parameters (2trace(S) - trace(S'S)): 215.9051
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5348.095
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122049.3
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 121879.2
## Residual sum of squares: 1.032151e+12
## R-square value: 0.5506
## Adjusted R-square value: 0.5324541
##
## ***********************************************************************
## Program stops at: 2020-06-01 00:21:59
By using the maximum bandwidth established earlier, we can see that the R-square value has gone up slightly which means that using geographical weighted method has resulted in a better model overall. However, we need to check the geographic R-square distribution below.
GWR.sf <- st_as_sf(gwr.fixed$SDF) %>%
st_transform(4674)
GWR.sf.transformed <- st_transform(GWR.sf, 4674)
gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)
Brazil_sig_Indic.sf.fixed <- cbind(Brazil_sig_Indic.sf, as.matrix(gwr.fixed.output))
range(Brazil_sig_Indic.sf.fixed$Local_R2)
## [1] 0.4524265 0.9703265
summary(gwr.fixed$SDF$yhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -15511 8943 17909 21388 29558 104354
tm_shape(Brazil_sig_Indic.sf.fixed) +
tm_fill(col = "Local_R2",
style = "jenks",
palette = "Greens",
title = "R-squared Values")
As we can see, there does not seem to be any pattern in distribution. Although the model does seem to explain some area better than others, it is not clear why this is the case.