Criando tabela casos
Filtrando somente casos de covid
casos_fil_covid <- filter(casos_fil, CLASSI_FIN == 5)
Criando a coluna menor data
Summary da tabela
summary(casos_fil_covid)
## DT_NOTIFIC SEM_NOT SG_UF_NOT ID_MUNICIP
## Min. :2020-02-21 Min. : 1.00 Length:2136005 Length:2136005
## 1st Qu.:2020-11-11 1st Qu.:12.00 Class :character Class :character
## Median :2021-03-24 Median :21.00 Mode :character Mode :character
## Mean :2021-03-17 Mean :22.53
## 3rd Qu.:2021-06-15 3rd Qu.:30.00
## Max. :2022-12-09 Max. :53.00
##
## CO_MUN_NOT CS_SEXO DT_NASC CS_GESTANT
## Min. :110001 Length:2136005 Length:2136005 Min. :0.000
## 1st Qu.:310620 Class :character Class :character 1st Qu.:5.000
## Median :351880 Mode :character Mode :character Median :6.000
## Mean :344069 Mean :5.779
## 3rd Qu.:410690 3rd Qu.:6.000
## Max. :530010 Max. :9.000
##
## CS_RACA CS_ESCOL_N PAC_COCBO CS_ZONA
## Min. :1.00 Min. :0.0 Length:2136005 Min. :1.00
## 1st Qu.:1.00 1st Qu.:2.0 Class :character 1st Qu.:1.00
## Median :4.00 Median :4.0 Mode :character Median :1.00
## Mean :3.49 Mean :5.4 Mean :1.15
## 3rd Qu.:4.00 3rd Qu.:9.0 3rd Qu.:1.00
## Max. :9.00 Max. :9.0 Max. :9.00
## NA's :28576 NA's :716885 NA's :233075
## VACINA_COV VACINA DT_UT_DOSE HOSPITAL
## Min. :1.0 Min. :1.0 Length:2136005 Min. :1.00
## 1st Qu.:2.0 1st Qu.:2.0 Class :character 1st Qu.:1.00
## Median :2.0 Median :2.0 Mode :character Median :1.00
## Mean :2.6 Mean :5.1 Mean :1.04
## 3rd Qu.:2.0 3rd Qu.:9.0 3rd Qu.:1.00
## Max. :9.0 Max. :9.0 Max. :9.00
## NA's :326476 NA's :528302 NA's :45286
## DT_INTERNA UTI DT_ENTUTI
## Min. :2020-01-05 Min. :1.00 Min. :2020-01-05
## 1st Qu.:2020-11-06 1st Qu.:1.00 1st Qu.:2020-11-09
## Median :2021-03-21 Median :2.00 Median :2021-03-24
## Mean :2021-03-23 Mean :1.79 Mean :2021-03-17
## 3rd Qu.:2021-06-11 3rd Qu.:2.00 3rd Qu.:2021-06-14
## Max. :9202-09-11 Max. :9.00 Max. :4202-05-26
## NA's :254657
## DT_SAIDUTI CLASSI_FIN EVOLUCAO DT_EVOLUCA
## Min. :2020-02-21 Min. :5 Min. :1.0 Min. :2020-02-21
## 1st Qu.:2020-11-12 1st Qu.:5 1st Qu.:1.0 1st Qu.:2020-11-17
## Median :2021-03-26 Median :5 Median :1.0 Median :2021-03-31
## Mean :2021-03-18 Mean :5 Mean :1.5 Mean :2021-03-23
## 3rd Qu.:2021-06-16 3rd Qu.:5 3rd Qu.:2.0 3rd Qu.:2021-06-21
## Max. :2121-03-13 Max. :5 Max. :9.0 Max. :2022-12-04
## NA's :99481
## DT_MIN.DT_MIN
## Min. :2020-01-05
## 1st Qu.:2020-11-05
## Median :2021-03-21
## Mean :2021-03-12
## 3rd Qu.:2021-06-10
## Max. :2022-12-04
##
Criando tabelas por tipo (Possuimos muitas colunas, algumas delas com muitos nulo, portanto para verificar quais possuem impacto na mortalidade vou separar por tipo, por exemplo, demográficas, comorbidades, relacionadas a vacinação, etc.)
Demográficas
Criando dummies
casos_fil_covid <- dummy_cols(casos_fil_covid, select_columns = c('CS_SEXO','CS_RACA','CS_ZONA', 'VACINA_COV','UTI', 'CLASSI_FIN'),
remove_selected_columns = TRUE)
Porcentagem de nulos na coluna EVOLUCAO.
sum(is.na(casos_fil_covid$EVOLUCAO)) / length(casos_fil_covid$EVOLUCAO)
## [1] 0.0465484
Vou limpar certas colunas para que meus dados fiquem como one hot encoding.
casos_fil_covid <-casos_fil_covid %>% select( -SG_UF_NOT, -ID_MUNICIP, -CO_MUN_NOT, -DT_NASC, -CS_GESTANT, -CS_ESCOL_N, -VACINA, -HOSPITAL, -DT_MIN, -DT_EVOLUCA, -DT_SAIDUTI, -DT_ENTUTI, -DT_INTERNA, -DT_NOTIFIC, -PAC_COCBO,-DT_UT_DOSE, -SEM_NOT )
Resumo do dataframe a ser analisado
summary(casos_fil_covid)
## EVOLUCAO CS_SEXO_F CS_SEXO_I CS_SEXO_M
## Min. :0.00 Min. :0.0000 Min. :0.0000000 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000000 1st Qu.:0.0000
## Median :1.00 Median :0.0000 Median :0.0000000 Median :1.0000
## Mean :0.65 Mean :0.4484 Mean :0.0001272 Mean :0.5515
## 3rd Qu.:1.00 3rd Qu.:1.0000 3rd Qu.:0.0000000 3rd Qu.:1.0000
## Max. :1.00 Max. :1.0000 Max. :1.0000000 Max. :1.0000
## NA's :99152
## CS_RACA_1 CS_RACA_2 CS_RACA_3 CS_RACA_4
## Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.00 Median :0.000
## Mean :0.429 Mean :0.043 Mean :0.01 Mean :0.342
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.00 3rd Qu.:1.000
## Max. :1.000 Max. :1.000 Max. :1.00 Max. :1.000
## NA's :28434 NA's :28434 NA's :28434 NA's :28434
## CS_RACA_5 CS_RACA_9 CS_RACA_NA CS_ZONA_1
## Min. :0.000 Min. :0.000 Min. :0.00000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:1.00
## Median :0.000 Median :0.000 Median :0.00000 Median :1.00
## Mean :0.002 Mean :0.174 Mean :0.01335 Mean :0.93
## 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.00000 3rd Qu.:1.00
## Max. :1.000 Max. :1.000 Max. :1.00000 Max. :1.00
## NA's :28434 NA's :28434 NA's :231863
## CS_ZONA_2 CS_ZONA_3 CS_ZONA_9 CS_ZONA_NA
## Min. :0.00 Min. :0 Min. :0.00 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0 1st Qu.:0.00 1st Qu.:0.0000
## Median :0.00 Median :0 Median :0.00 Median :0.0000
## Mean :0.05 Mean :0 Mean :0.01 Mean :0.1089
## 3rd Qu.:0.00 3rd Qu.:0 3rd Qu.:0.00 3rd Qu.:0.0000
## Max. :1.00 Max. :1 Max. :1.00 Max. :1.0000
## NA's :231863 NA's :231863 NA's :231863
## VACINA_COV_1 VACINA_COV_2 VACINA_COV_9 VACINA_COV_NA
## Min. :0.0 Min. :0.0 Min. :0.0 Min. :0.0000
## 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0000
## Median :0.0 Median :1.0 Median :0.0 Median :0.0000
## Mean :0.2 Mean :0.7 Mean :0.1 Mean :0.1529
## 3rd Qu.:0.0 3rd Qu.:1.0 3rd Qu.:0.0 3rd Qu.:0.0000
## Max. :1.0 Max. :1.0 Max. :1.0 Max. :1.0000
## NA's :325774 NA's :325774 NA's :325774
## UTI_1 UTI_2 UTI_9 UTI_NA
## Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.0000
## Median :0.00 Median :1.00 Median :0.00 Median :0.0000
## Mean :0.37 Mean :0.61 Mean :0.02 Mean :0.1189
## 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:0.00 3rd Qu.:0.0000
## Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.0000
## NA's :253195 NA's :253195 NA's :253195
## CLASSI_FIN_5 idade
## Min. :1 Min. : 0.00274
## 1st Qu.:1 1st Qu.: 45.50685
## Median :1 Median : 59.41918
## Mean :1 Mean : 58.41555
## 3rd Qu.:1 3rd Qu.: 72.51233
## Max. :1 Max. : 99.99726
##