Introducción
Importación y limpieza de datos
Revisión rápida de calidad
Estadística descriptiva univariada
Frecuencias y proporciones
Agregados por mes y monto total
Códigos de infracción más frecuentes
Visualizaciones
Interpretación y recomendaciones
Renderizar a PDF

Introducción

Este informe analiza la base de datos BD Comparendos OK$.xlsx (2120 registros). Se presentan las características principales, estadística descriptiva y visualizaciones para apoyar la interpretación de los datos.

Importación y limpieza de datos

archivo <- "BD Comparendos OK$.xlsx"
datos <- read_excel(archivo)
names(datos) <- trimws(names(datos))
head(datos)

datos$`VALOR_A_PAGAR` <- as.numeric(datos$VALOR_A_PAGAR)
datos$`COD. INFRACCION` <- as.factor(datos$`COD. INFRACCION`)
datos$SEXO <- as.factor(datos$SEXO)
datos$`TIPO DE VEHICULO` <- as.factor(datos$`TIPO DE VEHICULO`)
datos$`NOMBRE DEL MES` <- as.factor(datos$`NOMBRE DEL MES`)
if("FECHA DE COMPARENDO" %in% names(datos)) {
  datos$`FECHA DE COMPARENDO` <- as.Date(datos$`FECHA DE COMPARENDO`)
}
str(datos)

## tibble [2,120 × 17] (S3: tbl_df/tbl/data.frame)
##  $ No. MANDAMIENTO DE PAGO  : chr [1:2120] "F00002658" "F00004304" "F00001544" "F00001578" ...
##  $ FECHA MANDAMIENTO DE PAGO: chr [1:2120] "19/08/2010" "19/08/2010" "19/08/2010" "19/08/2010" ...
##  $ EJECUTADO                : chr [1:2120] "RAMIRO ANTONIO PEREZ HERRERA" "TRANSPORTES MST  Y CIA S. EN C" "NANCY ANGELICA GARCIA DUARTE" "FILOMENA DEL SOCORRO MARTINEZ ROMERO" ...
##  $ TIPO DE IDENTIFICACION   : chr [1:2120] "Cedula de Ciudadanía" "Nit" "Cedula de Ciudadanía" "Cedula de Ciudadanía" ...
##  $ No. IDENTIFICACION       : num [1:2120] 7.21e+07 9.00e+09 2.27e+07 2.32e+07 2.25e+07 ...
##  $ SEXO                     : Factor w/ 4 levels "CÉDULA NUEVA",..: 3 2 4 4 4 4 4 3 2 2 ...
##  $ COD. INFRACCION          : Factor w/ 4 levels "64","67","76",..: 2 1 1 1 1 1 1 3 4 4 ...
##  $ COMPARENDO               : chr [1:2120] "F00007946" "F0000469" "F0000925" "F0000160" ...
##  $ FECHA DE COMPARENDO      : Date[1:2120], format: "2080-05-05" "2080-04-14" ...
##  $ FECHA                    : POSIXct[1:2120], format: "2010-05-04" "2010-05-05" ...
##  $ AÑO                      : num [1:2120] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ DIA                      : num [1:2120] 4 13 15 12 26 30 23 14 24 18 ...
##  $ MES                      : num [1:2120] 5 4 4 4 4 4 5 4 5 4 ...
##  $ NOMBRE DEL MES           : Factor w/ 2 levels "Abril","Mayo": 2 1 1 1 1 1 2 1 2 1 ...
##  $ PACA DE VEHICULO         : chr [1:2120] "QHO294" "UYU967" "UYZ604" "EDN17B" ...
##  $ TIPO DE VEHICULO         : Factor w/ 2 levels "CARRO","MOTO": 1 1 1 2 2 2 2 2 1 1 ...
##  $ VALOR_A_PAGAR            : num [1:2120] 438900 438900 438900 438900 438900 ...

Revisión rápida de calidad

dim(datos)

## [1] 2120   17

na_por_col <- sapply(datos, function(x) sum(is.na(x)))
kable(as.data.frame(na_por_col), caption="Valores faltantes por columna")

Valores faltantes por columna
	na_por_col
No. MANDAMIENTO DE PAGO	0
FECHA MANDAMIENTO DE PAGO	0
EJECUTADO	0
TIPO DE IDENTIFICACION	0
No. IDENTIFICACION	0
SEXO	0
COD. INFRACCION	0
COMPARENDO	0
FECHA DE COMPARENDO	0
FECHA	0
AÑO	0
DIA	0
MES	0
NOMBRE DEL MES	0
PACA DE VEHICULO	0
TIPO DE VEHICULO	0
VALOR_A_PAGAR	0

duplicados <- datos %>% group_by(`No. IDENTIFICACION`, `FECHA DE COMPARENDO`) %>% filter(n()>1)
nrow(duplicados)

## [1] 330

Estadística descriptiva univariada

summary(datos)

##  No. MANDAMIENTO DE PAGO FECHA MANDAMIENTO DE PAGO  EJECUTADO        
##  Length:2120             Length:2120               Length:2120       
##  Class :character        Class :character          Class :character  
##  Mode  :character        Mode  :character          Mode  :character  
##                                                                      
##                                                                      
##                                                                      
##  TIPO DE IDENTIFICACION No. IDENTIFICACION            SEXO     COD. INFRACCION
##  Length:2120            Min.   :2.358e+05   CÉDULA NUEVA: 88   64:1626        
##  Class :character       1st Qu.:3.273e+07   EMPRESA     :472   67: 105        
##  Mode  :character       Median :7.214e+07   HOMBRE      :615   76:   4        
##                         Mean   :8.476e+08   MUJER       :945   77: 385        
##                         3rd Qu.:8.002e+08                                     
##                         Max.   :9.003e+09                                     
##   COMPARENDO        FECHA DE COMPARENDO      FECHA                    
##  Length:2120        Min.   :2080-04-13   Min.   :2010-05-04 00:00:00  
##  Class :character   1st Qu.:2080-04-28   1st Qu.:2011-10-15 18:00:00  
##  Mode  :character   Median :2080-05-07   Median :2013-03-28 12:00:00  
##                     Mean   :2080-05-06   Mean   :2013-03-28 12:00:00  
##                     3rd Qu.:2080-05-17   3rd Qu.:2014-09-09 06:00:00  
##                     Max.   :2080-05-28   Max.   :2016-02-21 00:00:00  
##       AÑO            DIA             MES        NOMBRE DEL MES
##  Min.   :2010   Min.   : 1.00   Min.   :4.000   Abril: 712    
##  1st Qu.:2010   1st Qu.: 9.00   1st Qu.:4.000   Mayo :1408    
##  Median :2010   Median :16.00   Median :5.000                 
##  Mean   :2010   Mean   :15.76   Mean   :4.664                 
##  3rd Qu.:2010   3rd Qu.:23.00   3rd Qu.:5.000                 
##  Max.   :2010   Max.   :30.00   Max.   :5.000                 
##  PACA DE VEHICULO   TIPO DE VEHICULO VALOR_A_PAGAR   
##  Length:2120        CARRO:1938       Min.   :438900  
##  Class :character   MOTO : 182       1st Qu.:438900  
##  Mode  :character                    Median :438900  
##                                      Mean   :438900  
##                                      3rd Qu.:438900  
##                                      Max.   :438900

val <- datos$VALOR_A_PAGAR
estad_valor <- data.frame(
  N = length(val),
  Media = mean(val, na.rm=TRUE),
  Mediana = median(val, na.rm=TRUE),
  SD = sd(val, na.rm=TRUE),
  Min = min(val, na.rm=TRUE),
  Q1 = quantile(val, 0.25, na.rm=TRUE),
  Q3 = quantile(val, 0.75, na.rm=TRUE),
  Max = max(val, na.rm=TRUE)
)
kable(estad_valor, digits = 2, caption = "Estadísticos para VALOR_A_PAGAR")

Estadísticos para VALOR_A_PAGAR
	N	Media	Mediana	SD	Min	Q1	Q3	Max
25%	2120	438900	438900	0	438900	438900	438900	438900

Frecuencias y proporciones

tab_sexo <- table(datos$SEXO)
por_sexo <- round(prop.table(tab_sexo)*100, 2)
kable(as.data.frame(tab_sexo), caption="Conteo por SEXO")

Conteo por SEXO
Var1	Freq
CÉDULA NUEVA	88
EMPRESA	472
HOMBRE	615
MUJER	945

kable(as.data.frame(por_sexo), caption="Porcentaje por SEXO (%)")

Porcentaje por SEXO (%)
Var1	Freq
CÉDULA NUEVA	4.15
EMPRESA	22.26
HOMBRE	29.01
MUJER	44.58

tab_veh <- table(datos$`TIPO DE VEHICULO`)
kable(as.data.frame(tab_veh), caption="Conteo por TIPO DE VEHICULO")

Conteo por TIPO DE VEHICULO
Var1	Freq
CARRO	1938
MOTO	182

tab_cruz <- table(datos$SEXO, datos$`TIPO DE VEHICULO`)
kable(as.data.frame.matrix(tab_cruz), caption="Comparendos por SEXO y TIPO DE VEHICULO")

Comparendos por SEXO y TIPO DE VEHICULO
	CARRO	MOTO
CÉDULA NUEVA	81	7
EMPRESA	455	17
HOMBRE	526	89
MUJER	876	69

Agregados por mes y monto total

tab_mes <- table(datos$`NOMBRE DEL MES`)
kable(as.data.frame(tab_mes), caption="Conteo por Mes")

Conteo por Mes
Var1	Freq
Abril	712
Mayo	1408

suma_mes <- aggregate(VALOR_A_PAGAR ~ `NOMBRE DEL MES`, datos, sum)
kable(suma_mes, caption="Suma VALOR_A_PAGAR por mes")

Suma VALOR_A_PAGAR por mes
NOMBRE DEL MES	VALOR_A_PAGAR
Abril	312496800
Mayo	617971200

Códigos de infracción más frecuentes

conteo_cod <- table(datos$`COD. INFRACCION`)
top_cod <- names(conteo_cod)[which.max(conteo_cod)]
top_cod_count <- max(conteo_cod)
kable(as.data.frame(conteo_cod), caption="Frecuencia por COD. INFRACCION")

Frecuencia por COD. INFRACCION
Var1	Freq
64	1626
67	105
76	4
77	385

cat("Código de infracción más común:", top_cod, "con", top_cod_count, "ocurrencias.\n")

## Código de infracción más común: 64 con 1626 ocurrencias.

Visualizaciones

hist(val, main="Histograma de VALOR_A_PAGAR", xlab="VALOR_A_PAGAR", breaks=20)

boxplot(val, main="Boxplot de VALOR_A_PAGAR", ylab="VALOR_A_PAGAR")

barplot(tab_sexo, main="Comparendos por Sexo", ylab="Frecuencia", beside=TRUE)

barplot(as.matrix(tab_cruz), beside=TRUE, legend.text=TRUE, main="Comparendos por Sexo y Tipo de Vehículo")

Interpretación y recomendaciones

La variable VALOR_A_PAGAR es constante: si esto es intencional, está correcto (multa fija). Si no, revisar captura de datos o formato de importación.
Código 64 concentra la mayoría de infracciones (~76.7%), por lo que conviene revisar su definición normativa.
Mayo tiene casi el doble de comparendos que Abril; investigar si hubo campañas o cambios normativos.
Posibles análisis futuros: chi-cuadrado para sexo vs tipo de vehículo, pruebas t para medias si hay variables numéricas adicionales.

Renderizar a PDF

# install.packages("tinytex")
# tinytex::install_tinytex()
rmarkdown::render("Informe_Comparendos.Rmd", output_format = "pdf_document")

Informe Estadístico: BD Comparendos

Análisis Automático — Generado por Diomedes Buelvas

2025-11-06