Contexto

Eu fiquei fazendo um monte de conta aqui para calcular direito a taxa de homicídios. Entretanto, a gente tem essas taxas prontas pela equipe do IPEA. Essa base é homicídio-cidade-ano.
LINK

Acredito que é melhor usar essa base do que ir nos microdados do SIM.

Pacotes

library(loadinstall)
#devtools::install_github("flaviaerius/loadinstall")

packages <- c("haven", "readr", "ggplot2", "ggrepel", "ggthemes", 
  "tidyr", "tidyverse", "sidrar", "rbcb", "readxl", "lubridate", 
  "sjPlot", "sjstats", "broom", "kableExtra", "modelsummary", "kableExtra", "plm", "AER", "zoo", "readr", "ggplot2", "methods", "ggthemes", 
  "directlabels", "ggrepel", "readxl", "haven", "dplyr", "knitr", "tidyr", 
  "gghighlight", "fixest", "modelsummary", "basedosdados", "DBI", 
  "data.table", "did", "lubridate", "stargazer", "bigrquery", "stringi", "sidrar", "viridis", "gtsummary",
  "broom", "data.table", "viridis")

lapply(packages, dynamic_require)

Taxas de homicídios por cidade

Estou na segunda tabela do link acima. Ou seja, baixei os dados dos homicídios, nomeei como homicidios_atlas.cvs e agora estou abrindo os dados novamente.

A vantagem dessa base é que ela tem os zeros. Ou seja, eu sei quais lugares que não teve nenhum homicídio.

h1 <- read_delim("homicidios_atlas.csv",  delim = ";", escape_double = FALSE, trim_ws = TRUE)
load("populacao.Rda")

h1$id<-h1$cod
h1$ano<-h1$período
pop$id<-pop$id_municipio

## vou juntar as duas bases 

h2<-merge(pop, h1, by=c("ano", "id"))



h2$txh<-100000*(h2$valor/h2$populacao)

Se os dados daqui estão certos, as maiores taxas de homicídio do planeta estão por volta de 40 por 100 mil pessoas por ano.

Jamaica *	53.3	1,508	2022	Americas	Latin America and the Caribbean
U.S. Virgin Islands	49.6	52	2012	Americas	Latin America and the Caribbean
South Africa *	41.9	24,865	2021	Africa	Sub-Saharan Africa
Saint Vincent and the Grenadines	40.4	42	2022	Americas	Latin America and the Caribbean
Trinidad and Tobago *	40.0	605	2022	Americas	Latin America and the Caribbean

Wikipedia

Então, o que eu vou fazer é dizer que “passou de 40 = tá muito ruim”, e vou colocar uma categoria para isso.

Se eu quero fazer mapas, preciso de shapes!

library(sf)

## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

shpMun = st_read("BR_Municipios_2021.shp")

## Reading layer `BR_Municipios_2021' from data source 
##   `G:\Meu Drive\AGR\BR_Municipios_2021.shp' using driver `ESRI Shapefile'
## Simple feature collection with 5572 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.99045 ymin: -33.75118 xmax: -28.84764 ymax: 5.271841
## Geodetic CRS:  SIRGAS 2000

shpUFs <- st_read(".", "BR_UF_2021")

## Reading layer `BR_UF_2021' from data source `G:\Meu Drive\AGR' using driver `ESRI Shapefile'
## Simple feature collection with 27 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.99045 ymin: -33.75118 xmax: -28.84764 ymax: 5.271841
## Geodetic CRS:  SIRGAS 2000

mapaMun <- as_tibble(shpMun)
mapaMun$id <- mapaMun$NM_MUN
mapaUF <- as_tibble(shpUFs)

Agora eu preciso fazer as duas bases conversarem.

h2$CD_MUN<-h2$id
h3 <- merge(shpMun, h2, by.x = "CD_MUN", by.y = "CD_MUN")

Criando as categorias para facilitar

h3$cat[h3$txh <= 3] <- 1
h3$cat[h3$txh > 3] <- 2
h3$cat[h3$txh > 10] <- 3
h3$cat[h3$txh > 20] <- 4
h3$cat[h3$txh > 30] <- 5
h3$cat[h3$txh > 50] <- 6

table(h3$cat)

## 
##      1      2      3      4      5      6 
## 102611  34523  50501  33222  31479  17697

Fazendo o mapa efetivamente

tx2000 <- ggplot() +
  geom_sf(data = h3[h3$ano == 2000, ], aes(fill = cat), color = NA, size = 0.1) +
  geom_sf(data = shpUFs, color = "black", fill = NA, size = 0.2) +
  theme_minimal() +
  ylab("Latitude") +
  xlab("Longitude") +
  theme(legend.position = "bottom") +
  theme(legend.title = element_blank()) +
   theme_minimal() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "cm"),
    plot.background = element_blank(),
    legend.position = "bottom") +
 scale_fill_viridis_c(limits = c(1 , 6), oob = scales::squish, 
                         breaks = c(1, 2, 3, 4, 5, 6),  # Specify the breaks
                      labels = c("<3 ", "(3;10)", "(10;20)", "(20;30)","(30;50)", ">50"))+ 
                        guides(fill = guide_legend(title = "Homicídios/100 mil hab/ano"))+    ggtitle("2000") 


# Print the plot

tx2000

ggsave(file = "tx2000.png", plot = tx2000 , width = 6.472, height = 4)



tx2010 <- ggplot() +
  geom_sf(data = h3[h3$ano == 2010, ], aes(fill = cat), color = NA, size = 0.1) +
  geom_sf(data = shpUFs, color = "black", fill = NA, size = 0.2) +
  theme_minimal() +
  ylab("Latitude") +
  xlab("Longitude") +
  theme(legend.position = "bottom") +
  theme(legend.title = element_blank()) +
   theme_minimal() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "cm"),
    plot.background = element_blank(),
    legend.position = "bottom") +
 scale_fill_viridis_c(limits = c(1 , 6), oob = scales::squish, 
                         breaks = c(1, 2, 3, 4, 5, 6),  # Specify the breaks
                      labels = c("<3 ", "(3;10)", "(10;20)", "(20;30)","(30;50)", ">50"))+ 
                        guides(fill = guide_legend(title = "Homicídios/100 mil hab/ano"))  + ggtitle("2010")  # Add plot title


# Print the plot

tx2010

ggsave(file = "tx2010.png", plot = tx2010 , width = 6.472, height = 4)


tx2020 <- ggplot() +
  geom_sf(data = h3[h3$ano == 2020, ], aes(fill = cat), color = NA, size = 0.1) +
  geom_sf(data = shpUFs, color = "black", fill = NA, size = 0.2) +
  theme_minimal() +
  ylab("Latitude") +
  xlab("Longitude") +
  theme(legend.position = "bottom") +
  theme(legend.title = element_blank()) +
   theme_minimal() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "cm"),
    plot.background = element_blank(),
    legend.position = "bottom") +
 scale_fill_viridis_c(limits = c(1 , 6), oob = scales::squish, 
                         breaks = c(1, 2, 3, 4, 5, 6),  # Specify the breaks
                      labels = c("<3 ", "(3;10)", "(10;20)", "(20;30)","(30;50)", ">50"))+ 
                        guides(fill = guide_legend(title = "Homicídios/100 mil hab/ano"))+ 
    ggtitle("2020")  # Add plot title


# Print the plot

tx2020

ggsave(file = "tx2020.png", plot = tx2020 , width = 6.472, height = 4)

Note que os mapas estão na mesma escala de cores. Ou seja, o roxo vai ser sempre uma taxa menor do que 3. Isso é importante. Se você deixar ele escolher as cores sozinho, ele vai reescalonar para cada mapa e não vai fazer sentido a comparação.

A base tá aí. Tudo lindo, basta vocês testarem as suas próprias hipóteses.

Só explicando um pouco. Essa base é derivada do Sistema de Informação de Mortalidade (SIM). Aqui estão os dados de uma mortalidade específica, os homicídios.
Mas QUALQUER coisa em relação a mortalidade vai tá no mesmo sistema. Ou seja, se você tiver interesse por

mortes por suicídio
mortes por envenenamento
mortes por acidentes de trabalho

Tudo isso vai tá no mesmo sistema, no SIM.

Por outro lado, os dados de nascidos vivos é uma sistema semelhante, o SINASC. Lá você consegue saber:

peso de criança
quantos nasceram
quantos morreram
quantos nasceram abaixo do peso, ou com algum anomalia

Essas são as coisas que eu lembro de cabeça, mas dá para saber muito mais coisas desses sistemas.

Todas as linhas para acessar esses dados diretamente pela Base dos Dados (BD), são parecidas com isso daqui.

peso <- basedosdados::read_sql(‘SELECT ano, AVG(peso) AS average_peso FROM basedosdados.br_ms_sinasc.microdados WHERE ano BETWEEN 1980 AND 2023 GROUP BY ano ORDER BY ano’)

SIDRAR

Parece feio, mas funciona.

A primeira coisa que você precisa é ter a library do Sidra

sidrar

library(sidrar)

A segunda coisa que você precisa é saber qual a tabela quer!

Isso você vai encontrar no site do Sidra.

Essa é uma amostra (site).

O passo a passo para pegar o código da tabela a gente vai fazer na monitoria.

inf =
'/t/1737/n1/all/v/2266/p/all/d/v2266%2013' %>%
get_sidra(api=.)

## All others arguments are desconsidered when 'api' is informed

AGR_4_Homicidios

Leonardo Cardoso

2024-04-05

Contexto

Pacotes

Taxas de homicídios por cidade

SIDRAR