R Notebook

## Warning: package 'epiDisplay' was built under R version 4.4.3

## Cargando paquete requerido: foreign

## Cargando paquete requerido: survival

## Cargando paquete requerido: MASS

## Cargando paquete requerido: nnet

## 
## Adjuntando el paquete: 'dgof'

## The following object is masked from 'package:stats':
## 
##     ks.test

## 
## Adjuntando el paquete: 'ggplot2'

## The following object is masked from 'package:epiDisplay':
## 
##     alpha

## 
## Adjuntando el paquete: 'dplyr'

## The following object is masked from 'package:MASS':
## 
##     select

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Warning: package 'crosstable' was built under R version 4.4.3

## Warning: package 'ggalluvial' was built under R version 4.4.3

El \(80\%\) de las muertes se da en el sexo masculino, mientras que casi \(3/4\) partes de las muertes ocurren en individuos con \(15\) años o más. El \(90\%\) de las muertes son de menores mestizos \(61\%\) y negros \(30\%\)

Reporte_Muertes <- read_excel("C:/Users/ASUS/Documents/tesis lenin/Reporte Muertes Violentas Menores 18 Años Unidad Basica Cali 2018 - 2022.xlsx")
tab1(Reporte_Muertes$`RANGO DE EDAD`,
     sort.group = F, cum.percent = TRUE,main = 'Distribución de los rangos etarios de las muertes violentas')

## Reporte_Muertes$`RANGO DE EDAD` : 
##           Frequency Percent Cum. percent
## (00 a 04)        50     7.2          7.2
## (05 a 09)        21     3.0         10.2
## (10 a 14)       114    16.3         26.5
## (15 a 17)       514    73.5        100.0
##   Total         699   100.0        100.0

tab1(Reporte_Muertes$SEXO,
     sort.group = 'decreasing', cum.percent = TRUE,main = 'Distribución del sexo de las muertes violentas')

## Reporte_Muertes$SEXO : 
##                Frequency Percent Cum. percent
## MASCULINO            560    80.1         80.1
## FEMENINO             136    19.5         99.6
## NO DETERMINADO         3     0.4        100.0
##   Total              699   100.0        100.0

tab1(Reporte_Muertes$`ANCESTRO RACIAL`,
     sort.group = 'decreasing', cum.percent = TRUE,horiz=T,main = 'Distribución del ancestro racial de las muertes violentas')

## Reporte_Muertes$`ANCESTRO RACIAL` : 
##                 Frequency Percent Cum. percent
## MESTIZO               428    61.2         61.2
## NEGRO                 206    29.5         90.7
## SIN INFORMACIÓN        23     3.3         94.0
## MULATO                 21     3.0         97.0
## BLANCO                 14     2.0         99.0
## INDIGENA                7     1.0        100.0
##   Total               699   100.0        100.0

#hist(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`,breaks = 18,axes=F,main='Distribución de la edad en años de las muertes violentas',labels=T,ylim=c(0,230))
#axis(1,at = seq(0,18,1),labels = TRUE,pos = 0)

ggplot(Reporte_Muertes %>% filter(SEXO != "NO DETERMINADO"), 
       aes(x = `EDAD CALCULADA EN AÑOS`, fill = SEXO)) +
  geom_histogram(bins = 18, position = "dodge", color = "black") +
  labs(title = "Distribución de la edad en años de las muertes violentas",
       x = "Edad en años",
       y = "Frecuencia") +
  scale_x_continuous(breaks = seq(0, 18, 1)) +
  theme_minimal()

Se observa que la distribución de la edad de la población carece de simetría y tiene un comportamiento bimodal. Por lo que a simple vista parece no seguir una distribución normal. Al realizar el test de normalidad de Kolmogorov se corrobora que la distribución no es normal. \((p<0.00001)\)

Es evidente que la mayoría de muertes ocurre en adolescentes (más del \(75\%\) en mayores de \(14\) años)

Reporte_Muertes %>%
  group_by(SEXO) %>%
  summarise(
    Min = round(min(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Q1 = round(quantile(`EDAD CALCULADA EN AÑOS`, 0.25, na.rm = TRUE),2),
    Median = round(median(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Mean = round(mean(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Q3 = round(quantile(`EDAD CALCULADA EN AÑOS`, 0.75, na.rm = TRUE),2),
    Max = round(max(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Count = n()
  )

#Kolmorov smirnof test
ks.test(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`, "pnorm", 
        mean=mean(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`),
        sd=sd(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`))

## Warning in ks.test(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`, "pnorm", mean = mean(Reporte_Muertes$`EDAD CALCULADA EN AÑOS`), : default ks.test() cannot compute correct p-values with ties;
##  see help page for one-sample Kolmogorov test for discrete distributions.

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  Reporte_Muertes$`EDAD CALCULADA EN AÑOS`
## D = 0.29714, p-value < 2.2e-16
## alternative hypothesis: two-sided

Se observa que en la mayoría de muertes (el \(95\%\)) no hubo signos de maltrato

mytable <- table(Reporte_Muertes$`PRESENTA SIGNOS DE MALTRATO`)
lbls <-  paste0(names(mytable), " (", round(prop.table(mytable)*100, 2), "%)")
pie(mytable, labels = lbls,
   main="Distribución de la presencia de signos de maltrato", radius = 1, cex = 0.8)

tab1(Reporte_Muertes$`PRESENTA SIGNOS DE MALTRATO`,
      cum.percent = TRUE,graph = F)

## Reporte_Muertes$`PRESENTA SIGNOS DE MALTRATO` : 
##           Frequency Percent Cum. percent
## NO              665    95.1         95.1
## NO APLICA         2     0.3         95.4
## SI               32     4.6        100.0
##   Total         699   100.0        100.0

Se realizará una prueba de \(\chi^2\) para verificar si la relación entre las variables tipo de cuerpo y estado del cuerpo están relacionadas entre sí.

El \(95.7\%\) de los cuerpos son completos y de estos el \(94.2\) se encuentran en estado fresco \((630\) de \(669)\). Todos los cuerpos sin diligenciar \((17)\) se encontraban en estado fresco. El \(70\%\) o \(7\) de \(10\) de los cuerpos incompletos se encontraban esqueletizados. Del resto de categorías o combinaciones de ellas se tienen muy pocas o ninguna observación para obtener conclusiones.

tab1(Reporte_Muertes$`TIPO DE CUERPO`,
     sort.group = F, cum.percent = TRUE,main = 'Distribución del tipo de cuerpo de las muertes violentas',horiz=T)

## Reporte_Muertes$`TIPO DE CUERPO` : 
##                 Frequency Percent Cum. percent
## COMPLETO              669    95.7         95.7
## INCOMPLETO             11     1.6         97.3
## INCOMPLETO EXTR         1     0.1         97.4
## INCOMPLETO TRON         1     0.1         97.6
## SIN DILIGENCIAR        17     2.4        100.0
##   Total               699   100.0        100.0

tab1(Reporte_Muertes$`ESTADO DEL CUERPO`,
     sort.group = F, cum.percent = TRUE,main = 'Distribución del estado del cuerpo de las muertes violentas',horiz=T)

## Reporte_Muertes$`ESTADO DEL CUERPO` : 
##                 Frequency Percent Cum. percent
## CALCINADO               1     0.1          0.1
## DESCOMPUESTO           39     5.6          5.7
## ESQUELETIZADO           9     1.3          7.0
## FRESCO                648    92.7         99.7
## SIN INFORMACION         2     0.3        100.0
##   Total               699   100.0        100.0

En la tabla de contingencia de las dos variables se observan varias celdas en \(0\) o cercanas a este valor.

label	variable	ESTADO DEL CUERPO
label	variable	CALCINADO	DESCOMPUESTO	ESQUELETIZADO	FRESCO	SIN INFORMACION
TIPO DE CUERPO	COMPLETO	1 (0.15%)	35 (5.23%)	1 (0.15%)	630 (94.17%)	2 (0.30%)
	INCOMPLETO	0 (0%)	3 (27.27%)	7 (63.64%)	1 (9.09%)	0 (0%)
	INCOMPLETO EXTR	0 (0%)	0 (0%)	1 (100.00%)	0 (0%)	0 (0%)
	INCOMPLETO TRON	0 (0%)	1 (100.00%)	0 (0%)	0 (0%)	0 (0%)
	SIN DILIGENCIAR	0 (0%)	0 (0%)	0 (0%)	17 (100.00%)	0 (0%)

Debido a que los valores esperados de la tabla de contingencia son la mayoría cercanos a \(0\) también realizaremos una prueba exacta de Fisher en adición a la prueba de \(\chi^2\). En ambas pruebas se rechaza la hipótesis nula con un nivel de signicancia del \(5\%\), lo que sugiere que existe una asociación de dependencia entre las variables de estado y tipo del cuerpo.

contingency_table <- table(Reporte_Muertes$`TIPO DE CUERPO`, Reporte_Muertes$`ESTADO DEL CUERPO`)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                  
##                   CALCINADO DESCOMPUESTO ESQUELETIZADO FRESCO SIN INFORMACION
##   COMPLETO             0.96        37.33          8.61 620.19            1.91
##   INCOMPLETO           0.02         0.61          0.14  10.20            0.03
##   INCOMPLETO EXTR      0.00         0.06          0.01   0.93            0.00
##   INCOMPLETO TRON      0.00         0.06          0.01   0.93            0.00
##   SIN DILIGENCIAR      0.02         0.95          0.22  15.76            0.05

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 451.7, df = 16, p-value < 2.2e-16

(fisher.test(contingency_table))

## 
##  Fisher's Exact Test for Count Data
## 
## data:  contingency_table
## p-value = 8.168e-16
## alternative hypothesis: two.sided

Como se explicó anteriormente, en este caso también se evidencia la asociación de dependencia (tras aplicar las pruebas de \(chi^2\) y exacta de Fisher \(p<0.0001\) y \(p<0.0005\)).

tab1(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`,
     sort.group = 'decreasing', cum.percent = TRUE,horiz=T,main = 'Distribución de la manera de muerte de ingreso')

## Reporte_Muertes$`MANERA DE MUERTE DE INGRESO` : 
##                           Frequency Percent Cum. percent
## VIOLENTA - HOMICIDIO            326    46.6         46.6
## NO REGISTRA                     227    32.5         79.1
## VIOLENTA - SIN DETERMINAR        66     9.4         88.6
## VIOLENTA - SUICIDIO              46     6.6         95.1
## VIOLENTA - ACCIDENTAL            29     4.1         99.3
## SIN INFORMACIÓN                   3     0.4         99.7
## NATURAL                           1     0.1         99.9
## ACCIDENTE DE TRANSPORTE           1     0.1        100.0
##   Total                         699   100.0        100.0

df <- as.data.frame(as.table(table(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`, Reporte_Muertes$SEXO)))
crosstable(Reporte_Muertes, "SEXO", by="MANERA DE MUERTE DE INGRESO") %>% as_flextable()

label	variable	MANERA DE MUERTE DE INGRESO
label	variable	ACCIDENTE DE TRANSPORTE	NATURAL	NO REGISTRA	SIN INFORMACIÓN	VIOLENTA - ACCIDENTAL	VIOLENTA - HOMICIDIO	VIOLENTA - SIN DETERMINAR	VIOLENTA - SUICIDIO
SEXO	FEMENINO	0 (0%)	0 (0%)	34 (25.00%)	1 (0.74%)	13 (9.56%)	39 (28.68%)	27 (19.85%)	22 (16.18%)
	MASCULINO	1 (0.18%)	1 (0.18%)	192 (34.29%)	2 (0.36%)	16 (2.86%)	286 (51.07%)	38 (6.79%)	24 (4.29%)
	NO DETERMINADO	0 (0%)	0 (0%)	1 (33.33%)	0 (0%)	0 (0%)	1 (33.33%)	1 (33.33%)	0 (0%)

ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("MANERA DE MUERTE DE INGRESO") +
  ylab("SEXO") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5))+
  labs(title='Manera de muerte de ingreso vs Sexo')

contingency_table <- table(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`, Reporte_Muertes$SEXO)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                            
##                             FEMENINO MASCULINO NO DETERMINADO
##   ACCIDENTE DE TRANSPORTE       0.19      0.80           0.00
##   NATURAL                       0.19      0.80           0.00
##   NO REGISTRA                  44.17    181.86           0.97
##   SIN INFORMACIÓN               0.58      2.40           0.01
##   VIOLENTA - ACCIDENTAL         5.64     23.23           0.12
##   VIOLENTA - HOMICIDIO         63.43    261.17           1.40
##   VIOLENTA - SIN DETERMINAR    12.84     52.88           0.28
##   VIOLENTA - SUICIDIO           8.95     36.85           0.20

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 72.951, df = 14, p-value = 5.611e-10

fisher.test(contingency_table,simulate.p.value=T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  contingency_table
## p-value = 0.0004998
## alternative hypothesis: two.sided

df <- as.data.frame(as.table(table(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`, Reporte_Muertes$`RANGO DE EDAD`)))
ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("MANERA DE MUERTE DE INGRESO") +
  ylab("RANGO DE EDAD") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.margin = margin(1, 1, -10, 1, "pt"))+
  labs(title='Manera de muerte de ingreso vs Rango de edad')+
  scale_y_discrete(limits = rev)

crosstable(Reporte_Muertes, "RANGO DE EDAD", by="MANERA DE MUERTE DE INGRESO") %>% as_flextable()

label	variable	MANERA DE MUERTE DE INGRESO
label	variable	ACCIDENTE DE TRANSPORTE	NATURAL	NO REGISTRA	SIN INFORMACIÓN	VIOLENTA - ACCIDENTAL	VIOLENTA - HOMICIDIO	VIOLENTA - SIN DETERMINAR	VIOLENTA - SUICIDIO
RANGO DE EDAD	(00 a 04)	0 (0%)	0 (0%)	16 (32.00%)	0 (0%)	9 (18.00%)	10 (20.00%)	15 (30.00%)	0 (0%)
	(05 a 09)	0 (0%)	0 (0%)	9 (42.86%)	0 (0%)	6 (28.57%)	4 (19.05%)	2 (9.52%)	0 (0%)
	(10 a 14)	1 (0.88%)	1 (0.88%)	34 (29.82%)	1 (0.88%)	6 (5.26%)	48 (42.11%)	13 (11.40%)	10 (8.77%)
	(15 a 17)	0 (0%)	0 (0%)	168 (32.68%)	2 (0.39%)	8 (1.56%)	264 (51.36%)	36 (7.00%)	36 (7.00%)

Reporte_Muertes %>%
  group_by(`MANERA DE MUERTE DE INGRESO`) %>%
  summarise(
    Count = n(),  # Conteo total de casos
    Mean_Age = round(mean(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Promedio de edad
    SD_Age =  round(sd(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Desviación estándar
    Min_Age =  round(min(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Max_Age =  round(max(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Proportion = round(n() / nrow(Reporte_Muertes),2)  # Proporción del total
  ) %>%
  arrange(desc(Count))  # Ordenar por cantidad de casos

contingency_table <- table(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`, Reporte_Muertes$`RANGO DE EDAD`)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                            
##                             (00 a 04) (05 a 09) (10 a 14) (15 a 17)
##   ACCIDENTE DE TRANSPORTE        0.07      0.03      0.16      0.74
##   NATURAL                        0.07      0.03      0.16      0.74
##   NO REGISTRA                   16.24      6.82     37.02    166.92
##   SIN INFORMACIÓN                0.21      0.09      0.49      2.21
##   VIOLENTA - ACCIDENTAL          2.07      0.87      4.73     21.32
##   VIOLENTA - HOMICIDIO          23.32      9.79     53.17    239.72
##   VIOLENTA - SIN DETERMINAR      4.72      1.98     10.76     48.53
##   VIOLENTA - SUICIDIO            3.29      1.38      7.50     33.83

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 119.78, df = 21, p-value = 7.849e-16

fisher.test(contingency_table,simulate.p.value = T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  contingency_table
## p-value = 0.0004998
## alternative hypothesis: two.sided

tab1(Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`,
     sort.group = 'decreasing', cum.percent = TRUE,horiz=T,main = 'Distribución de la manera de muerte definitiva')

## Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA` : 
##                           Frequency Percent Cum. percent
## VIOLENTA - HOMICIDIO            518    74.1         74.1
## VIOLENTA - ACCIDENTAL            71    10.2         84.3
## VIOLENTA - SUICIDIO              66     9.4         93.7
## VIOLENTA - SIN DETERMINAR        44     6.3        100.0
##   Total                         699   100.0        100.0

df <- as.data.frame(as.table(table(Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`, Reporte_Muertes$SEXO)))
crosstable(Reporte_Muertes, "SEXO", by="MANERA DE MUERTE DEFINITIVA") %>% as_flextable()

label	variable	MANERA DE MUERTE DEFINITIVA
label	variable	VIOLENTA - ACCIDENTAL	VIOLENTA - HOMICIDIO	VIOLENTA - SIN DETERMINAR	VIOLENTA - SUICIDIO
SEXO	FEMENINO	29 (21.32%)	59 (43.38%)	16 (11.76%)	32 (23.53%)
	MASCULINO	42 (7.50%)	456 (81.43%)	28 (5.00%)	34 (6.07%)
	NO DETERMINADO	0 (0%)	3 (100.00%)	0 (0%)	0 (0%)

ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("MANERA DE MUERTE DE INGRESO") +
  ylab("SEXO") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5))+
  labs(title='Manera de muerte definitiva vs Sexo')

contingency_table <- table(Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`, Reporte_Muertes$SEXO)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                            
##                             FEMENINO MASCULINO NO DETERMINADO
##   VIOLENTA - ACCIDENTAL        13.81     56.88           0.30
##   VIOLENTA - HOMICIDIO        100.78    414.99           2.22
##   VIOLENTA - SIN DETERMINAR     8.56     35.25           0.19
##   VIOLENTA - SUICIDIO          12.84     52.88           0.28

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 86.29, df = 6, p-value < 2.2e-16

fisher.test(contingency_table,simulate.p.value = T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  contingency_table
## p-value = 0.0004998
## alternative hypothesis: two.sided

df <- as.data.frame(as.table(table(Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`, Reporte_Muertes$`RANGO DE EDAD`)))
ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("MANERA DE MUERTE DEFINITIVA") +
  ylab("RANGO DE EDAD") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.margin = margin(1, 1, -10, 1, "pt"))+
  labs(title='Manera de muerte definitiva vs Rango de edad')+
  scale_y_discrete(limits = rev)

crosstable(Reporte_Muertes, "RANGO DE EDAD", by="MANERA DE MUERTE DEFINITIVA") %>% as_flextable()

label	variable	MANERA DE MUERTE DEFINITIVA
label	variable	VIOLENTA - ACCIDENTAL	VIOLENTA - HOMICIDIO	VIOLENTA - SIN DETERMINAR	VIOLENTA - SUICIDIO
RANGO DE EDAD	(00 a 04)	23 (46.00%)	14 (28.00%)	13 (26.00%)	0 (0%)
	(05 a 09)	10 (47.62%)	5 (23.81%)	5 (23.81%)	1 (4.76%)
	(10 a 14)	16 (14.04%)	74 (64.91%)	8 (7.02%)	16 (14.04%)
	(15 a 17)	22 (4.28%)	425 (82.68%)	18 (3.50%)	49 (9.53%)

Reporte_Muertes %>%
  group_by(`MANERA DE MUERTE DEFINITIVA`) %>%
  summarise(
    Count = n(),  # Conteo total de casos
    Mean_Age = round(mean(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Promedio de edad
    SD_Age =  round(sd(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Desviación estándar
    Min_Age =  round(min(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Max_Age =  round(max(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Proportion = round(n() / nrow(Reporte_Muertes),2)  # Proporción del total
  ) %>%
  arrange(desc(Count))  # Ordenar por cantidad de casos

contingency_table <- table(Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`, Reporte_Muertes$`RANGO DE EDAD`)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                            
##                             (00 a 04) (05 a 09) (10 a 14) (15 a 17)
##   VIOLENTA - ACCIDENTAL          5.08      2.13     11.58     52.21
##   VIOLENTA - HOMICIDIO          37.05     15.56     84.48    380.90
##   VIOLENTA - SIN DETERMINAR      3.15      1.32      7.18     32.35
##   VIOLENTA - SUICIDIO            4.72      1.98     10.76     48.53

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 194.64, df = 9, p-value < 2.2e-16

fisher.test(contingency_table,simulate.p.value = T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  contingency_table
## p-value = 0.0004998
## alternative hypothesis: two.sided

# Crear el dataframe agrupando los casos
df_sankey <- Reporte_Muertes %>%
  count(`MANERA DE MUERTE DE INGRESO`, `MANERA DE MUERTE DEFINITIVA`) 

# Graficar el Sankey con ggalluvial
ggplot(df_sankey, aes(axis1 = `MANERA DE MUERTE DE INGRESO`, 
                      axis2 = `MANERA DE MUERTE DEFINITIVA`, 
                      y = n)) +
  geom_alluvium(aes(fill = `MANERA DE MUERTE DE INGRESO`), width = 1/12) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 4) +
  scale_x_discrete(limits = c("Ingreso", "Definitiva"), expand = c(0.15, 0.15)) +
  labs(title = "Diagrama de Sankey: Manera de Muerte Ingreso vs Definitiva",
       x = "", y = "Frecuencia") +
  theme_minimal() +
  theme(legend.position = "none")

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

crosstable(Reporte_Muertes, "MANERA DE MUERTE DE INGRESO",
           by="MANERA DE MUERTE DEFINITIVA") %>% as_flextable()

label	variable	MANERA DE MUERTE DEFINITIVA
label	variable	VIOLENTA - ACCIDENTAL	VIOLENTA - HOMICIDIO	VIOLENTA - SIN DETERMINAR	VIOLENTA - SUICIDIO
MANERA DE MUERTE DE INGRESO	ACCIDENTE DE TRANSPORTE	1 (100.00%)	0 (0%)	0 (0%)	0 (0%)
	NATURAL	0 (0%)	0 (0%)	1 (100.00%)	0 (0%)
	NO REGISTRA	22 (9.69%)	169 (74.45%)	14 (6.17%)	22 (9.69%)
	SIN INFORMACIÓN	2 (66.67%)	1 (33.33%)	0 (0%)	0 (0%)
	VIOLENTA - ACCIDENTAL	24 (82.76%)	0 (0%)	5 (17.24%)	0 (0%)
	VIOLENTA - HOMICIDIO	2 (0.61%)	319 (97.85%)	5 (1.53%)	0 (0%)
	VIOLENTA - SIN DETERMINAR	20 (30.30%)	29 (43.94%)	13 (19.70%)	4 (6.06%)
	VIOLENTA - SUICIDIO	0 (0%)	0 (0%)	6 (13.04%)	40 (86.96%)

contingency_table <- table(Reporte_Muertes$`MANERA DE MUERTE DE INGRESO`, Reporte_Muertes$`MANERA DE MUERTE DEFINITIVA`)

chi_sq_result <- chisq.test(contingency_table)

## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect

print(round(chi_sq_result$expected,2))

##                            
##                             VIOLENTA - ACCIDENTAL VIOLENTA - HOMICIDIO
##   ACCIDENTE DE TRANSPORTE                    0.10                 0.74
##   NATURAL                                    0.10                 0.74
##   NO REGISTRA                               23.06               168.22
##   SIN INFORMACIÓN                            0.30                 2.22
##   VIOLENTA - ACCIDENTAL                      2.95                21.49
##   VIOLENTA - HOMICIDIO                      33.11               241.59
##   VIOLENTA - SIN DETERMINAR                  6.70                48.91
##   VIOLENTA - SUICIDIO                        4.67                34.09
##                            
##                             VIOLENTA - SIN DETERMINAR VIOLENTA - SUICIDIO
##   ACCIDENTE DE TRANSPORTE                        0.06                0.09
##   NATURAL                                        0.06                0.09
##   NO REGISTRA                                   14.29               21.43
##   SIN INFORMACIÓN                                0.19                0.28
##   VIOLENTA - ACCIDENTAL                          1.83                2.74
##   VIOLENTA - HOMICIDIO                          20.52               30.78
##   VIOLENTA - SIN DETERMINAR                      4.15                6.23
##   VIOLENTA - SUICIDIO                            2.90                4.34

print(chi_sq_result)

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 700.1, df = 21, p-value < 2.2e-16

fisher.test(contingency_table,simulate.p.value = T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  contingency_table
## p-value = 0.0004998
## alternative hypothesis: two.sided

**—— revisar luego

#tab1(Reporte_Muertes$`CAUSA DE MUERTE DE INGRESO`,
#     sort.group = 'decreasing', cum.percent = TRUE,horiz=T,main = 'Distribución de la causa de muerte de ingreso')
df <- as.data.frame(as.table(table(Reporte_Muertes$`CAUSA DE MUERTE DE INGRESO`, Reporte_Muertes$SEXO)))
crosstable(Reporte_Muertes, "SEXO", by="CAUSA DE MUERTE DE INGRESO") %>% as_flextable()

label	variable	CAUSA DE MUERTE DE INGRESO
label	variable	ABORTO	AHORCAMIENTO	CAIDA DE ALTURA	CONTUNDENTE	CORTO CONTUNDENTE	CORTO PUNZANTE	ELECTROCUCIÓN	EN ESTUDIO	ESTRANGULAMIENTO	INMERSIÓN	INTOXICACIÓN O ENVENENAMIENTO POR AGENTE QUIMICO	INTOXICACIÓN O ENVENENAMIENTO POR SOBREDOSIS DE PSICOACTIVOS	NO REGISTRA	POR DETERMINAR	PROYECTIL DE ARMA DE FUEGO	PUNZANTE	QUEMADURA POR FUEGO	QUEMADURA POR LIQUIDO CALIENTE	SIN INFORMACIÓN	SOFOCACIÓN POR FALTA DE OXIGENO	SUMERSIÓN
SEXO	FEMENINO	1 (0.74%)	16 (11.76%)	7 (5.15%)	1 (0.74%)	0 (0%)	7 (5.15%)	1 (0.74%)	0 (0%)	1 (0.74%)	2 (1.47%)	7 (5.15%)	2 (1.47%)	34 (25.00%)	29 (21.32%)	21 (15.44%)	1 (0.74%)	1 (0.74%)	1 (0.74%)	0 (0%)	1 (0.74%)	3 (2.21%)
	MASCULINO	0 (0%)	26 (4.64%)	3 (0.54%)	6 (1.07%)	1 (0.18%)	34 (6.07%)	1 (0.18%)	1 (0.18%)	1 (0.18%)	10 (1.79%)	2 (0.36%)	1 (0.18%)	192 (34.29%)	37 (6.61%)	241 (43.04%)	0 (0%)	1 (0.18%)	0 (0%)	1 (0.18%)	0 (0%)	2 (0.36%)
	NO DETERMINADO	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (33.33%)	2 (66.67%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)

ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("CAUSA DE MUERTE DE INGRESO") +
  ylab("SEXO") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5))+
  labs(title='Causa de muerte de ingreso vs Sexo')

df <- as.data.frame(as.table(table(Reporte_Muertes$`CAUSA DE MUERTE DE INGRESO`, Reporte_Muertes$`RANGO DE EDAD`)))
ggplot(df, aes(Var1, Var2)) +
  geom_point(aes(size = Freq), colour = "green") +  
  scale_size_continuous(range = c(5, 30)) +         
  geom_text(aes(label = Freq), color = "black") +    
  theme_bw() +
  xlab("CAUSA DE MUERTE DE INGRESO") +
  ylab("RANGO DE EDAD") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.margin = margin(1, 1, -10, 1, "pt"))+
  labs(title='Causa de muerte de ingreso vs Rango de edad')+
  scale_y_discrete(limits = rev)

crosstable(Reporte_Muertes, "RANGO DE EDAD", by="CAUSA DE MUERTE DE INGRESO") %>% as_flextable()

label	variable	CAUSA DE MUERTE DE INGRESO
label	variable	ABORTO	AHORCAMIENTO	CAIDA DE ALTURA	CONTUNDENTE	CORTO CONTUNDENTE	CORTO PUNZANTE	ELECTROCUCIÓN	EN ESTUDIO	ESTRANGULAMIENTO	INMERSIÓN	INTOXICACIÓN O ENVENENAMIENTO POR AGENTE QUIMICO	INTOXICACIÓN O ENVENENAMIENTO POR SOBREDOSIS DE PSICOACTIVOS	NO REGISTRA	POR DETERMINAR	PROYECTIL DE ARMA DE FUEGO	PUNZANTE	QUEMADURA POR FUEGO	QUEMADURA POR LIQUIDO CALIENTE	SIN INFORMACIÓN	SOFOCACIÓN POR FALTA DE OXIGENO	SUMERSIÓN
RANGO DE EDAD	(00 a 04)	1 (2.00%)	0 (0%)	3 (6.00%)	2 (4.00%)	0 (0%)	1 (2.00%)	0 (0%)	0 (0%)	0 (0%)	2 (4.00%)	0 (0%)	0 (0%)	16 (32.00%)	14 (28.00%)	6 (12.00%)	0 (0%)	2 (4.00%)	1 (2.00%)	0 (0%)	0 (0%)	2 (4.00%)
	(05 a 09)	0 (0%)	2 (9.52%)	3 (14.29%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (4.76%)	2 (9.52%)	1 (4.76%)	9 (42.86%)	0 (0%)	3 (14.29%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
	(10 a 14)	0 (0%)	9 (7.89%)	2 (1.75%)	1 (0.88%)	0 (0%)	7 (6.14%)	2 (1.75%)	1 (0.88%)	2 (1.75%)	5 (4.39%)	0 (0%)	0 (0%)	34 (29.82%)	17 (14.91%)	33 (28.95%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (0.88%)	0 (0%)
	(15 a 17)	0 (0%)	31 (6.03%)	2 (0.39%)	4 (0.78%)	1 (0.19%)	33 (6.42%)	0 (0%)	0 (0%)	0 (0%)	4 (0.78%)	7 (1.36%)	2 (0.39%)	168 (32.68%)	37 (7.20%)	220 (42.80%)	1 (0.19%)	0 (0%)	0 (0%)	1 (0.19%)	0 (0%)	3 (0.58%)

Reporte_Muertes %>%
  group_by(`CAUSA DE MUERTE DE INGRESO`) %>%
  summarise(
    Count = n(),  # Conteo total de casos
    Mean_Age = round(mean(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Promedio de edad
    SD_Age =  round(sd(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),  # Desviación estándar
    Min_Age =  round(min(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Max_Age =  round(max(`EDAD CALCULADA EN AÑOS`, na.rm = TRUE),2),
    Proportion = round(n() / nrow(Reporte_Muertes),2)  # Proporción del total
  ) %>%
  arrange(desc(Count))