Exploración de las bases de datos (2023 y 2024)
# Carga de la base de datos de 2023
M23 = read.csv("Datos_molec_2023-1.csv")
# Carga de la base de datos de 2024
M24 = read.csv("Datos_molec_2024-1.csv")
# Estructura de la base de 2023 para la exploración inicial
str(M23)
## 'data.frame': 1996 obs. of 108 variables:
## $ folio : chr "11B213" "11B213" "11B213" "11B213" ...
## $ entidad : int 1 1 1 1 1 1 1 1 1 1 ...
## $ control : int 40003 40003 40003 40003 40007 40007 40007 40007 40034 40034 ...
## $ viv_sel : int 2 1 4 3 4 3 2 1 4 1 ...
## $ num_hog : int 1 1 1 1 1 1 1 1 1 1 ...
## $ hog_mud : int 0 0 0 0 0 0 0 0 0 0 ...
## $ n_ren_el: int 1 1 1 2 1 2 2 1 2 2 ...
## $ cd : int 14 14 14 14 14 14 14 14 14 14 ...
## $ periodo : int 223 223 223 223 223 223 223 223 223 223 ...
## $ sexo : int 1 1 2 2 1 2 2 1 2 2 ...
## $ edad : int 29 53 65 44 52 36 74 53 23 57 ...
## $ nivel : int 4 3 2 3 6 3 2 3 7 4 ...
## $ anio : int 3 3 6 3 3 3 6 3 4 3 ...
## $ cond_act: int 1 1 7 7 1 1 7 1 1 1 ...
## $ p1 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ p2 : int 2 2 2 2 2 2 1 2 1 2 ...
## $ p3_1 : int 2 2 2 2 2 2 1 2 1 2 ...
## $ p3_2 : int 2 2 2 2 2 2 2 2 2 2 ...
## $ p3_3 : int 2 1 2 2 2 2 2 2 2 2 ...
## $ p3_4 : int 2 2 2 2 2 2 2 2 2 2 ...
## $ p3_5 : int 2 2 2 2 1 2 2 1 1 2 ...
## $ p4 : int 0 0 0 0 0 0 2 0 3 0 ...
## $ p5 : int 0 0 0 0 0 0 4 0 2 0 ...
## $ p5_6esp : chr "" "" "" "" ...
## $ p6_1 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p6_2 : int 0 0 0 0 0 0 2 0 1 0 ...
## $ p6_3 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p6_4 : int 0 0 0 0 0 0 1 0 2 0 ...
## $ p6_5 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p6_6 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p6_6esp : chr "" "" "" "" ...
## $ p7 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p7_3 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p8_1 : int 0 0 0 0 0 0 2 0 2 0 ...
## $ p8_2 : int 0 0 0 0 0 0 1 0 1 0 ...
## $ p9 : int 0 0 0 0 0 0 1 0 1 0 ...
## $ p9_5esp : chr "" "" "" "" ...
## $ p10 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p11 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p11_6esp: chr "" "" "" "" ...
## $ p12_1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_3 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_4 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_5 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_6 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_7 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_8 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_9 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p12_9esp: chr "" "" "" "" ...
## $ p13 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p13_3 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p14_1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p14_2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p15 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p15_5esp: chr "" "" "" "" ...
## $ p16 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p17 : int 0 4 0 0 0 0 0 0 0 0 ...
## $ p17_6esp: chr "" "" "" "" ...
## $ p18_1 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p18_2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p18_3 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p18_4 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p18_5 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p19 : int 0 2 0 0 0 0 0 0 0 0 ...
## $ p19_3 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p20_1 : int 0 2 0 0 0 0 0 0 0 0 ...
## $ p20_2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ p21 : int 0 4 0 0 0 0 0 0 0 0 ...
## $ p21_5esp: chr "" "" "" "" ...
## $ p22 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p23_1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p23_2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p24 : int 0 0 0 0 2 0 0 5 2 0 ...
## $ p25 : int 0 0 0 0 1 0 0 4 2 0 ...
## $ p25_6esp: chr "" "" "" "" ...
## $ p26 : int 0 10 0 0 20 0 30 15 30 0 ...
## $ p27 : int 0 2 0 0 2 0 2 2 2 0 ...
## $ p28 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ p28_7esp: chr "" "" "" "" ...
## $ p29 : int 0 2 0 0 2 0 2 2 2 0 ...
## $ p30 : int 0 2 0 0 3 0 3 3 3 0 ...
## $ p31 : int 0 2 0 0 1 0 2 2 1 0 ...
## $ p32 : int 3 0 1 2 0 3 0 0 0 2 ...
## $ p32_6esp: chr "" "" "" "" ...
## $ p33_1 : int 2 2 2 2 2 2 2 2 1 2 ...
## $ p33_2 : int 2 2 2 2 2 2 2 2 2 2 ...
## $ p33_3 : int 2 2 2 2 2 2 2 2 1 2 ...
## $ p33_4 : int 2 2 2 2 2 2 2 2 1 2 ...
## $ p34_1 : int 2 2 2 2 2 2 2 2 1 2 ...
## $ p34_2 : int 2 1 2 2 1 2 1 1 1 2 ...
## $ p34_3 : int 2 2 2 2 1 2 2 1 1 2 ...
## $ p34_3_1 : int 0 0 0 0 2 0 0 1 2 0 ...
## $ p34_4 : int 1 1 1 1 1 1 1 1 1 2 ...
## $ p34_4_1 : int 1 1 1 1 1 1 1 1 1 0 ...
## $ p35 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ p36_1 : int 1 2 2 1 1 1 2 1 1 1 ...
## $ p36_2 : int 1 1 2 1 1 1 2 1 1 1 ...
## $ p36_3 : int 1 2 2 1 1 1 2 1 1 1 ...
## [list output truncated]
La mayoría de las variables son cualitativas, pero están insertas como números. Se tiene que revisar en el contexto del problema cuáles son realmente numéricas.
Diccionarios
# Carga del diccionario de 2023
Dicc23 = read.csv("Diccionario_molec_2023-1.csv")
head(Dicc23)
## NOMBRE_CAMPO TIPO NEMÓNICO CATÁLOGO
## 1 Folio C folio
## 2 Entidad C entidad entidad
## 3 Control C control
## 4 Vivienda seleccionada C viv_sel viv_sel
## 5 Número de hogar C num_hog num_hog
## 6 hogar mudado C hog_mud hog_mud
## RANGO_CLAVES
## 1 1 Zona, 1...3 Estrato, Últimos 4 panel de rotación
## 2 01...32
## 3 40001...41398
## 4 1...9
## 5 1...9
## 6 0...3
El diccionario indica qué significan las variables en el contexto del problema, pero no a qué equivalen sus categorías. Eso se debe leer en el archivo de descriptor de las preguntas y en el cuestionario.
Resumen de las variables (2023)
summary(M23)
## folio entidad control viv_sel num_hog
## Length:1996 Min. : 1.00 Min. :40001 Min. :1.000 Min. :1
## Class :character 1st Qu.: 9.00 1st Qu.:40100 1st Qu.:2.000 1st Qu.:1
## Mode :character Median :15.00 Median :40199 Median :2.000 Median :1
## Mean :15.61 Mean :40253 Mean :2.486 Mean :1
## 3rd Qu.:21.00 3rd Qu.:40335 3rd Qu.:3.000 3rd Qu.:1
## Max. :32.00 Max. :41398 Max. :4.000 Max. :1
##
## hog_mud n_ren_el cd periodo
## Min. :0.00000 Min. : 1.000 Min. : 1.00 Min. :223
## 1st Qu.:0.00000 1st Qu.: 1.000 1st Qu.: 2.00 1st Qu.:223
## Median :0.00000 Median : 1.000 Median : 9.00 Median :223
## Mean :0.05561 Mean : 1.667 Mean :13.85 Mean :223
## 3rd Qu.:0.00000 3rd Qu.: 2.000 3rd Qu.:25.00 3rd Qu.:223
## Max. :3.00000 Max. :10.000 Max. :43.00 Max. :223
##
## sexo edad nivel anio
## Min. :1.000 Min. :18.00 Min. : 0.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:32.00 1st Qu.: 3.000 1st Qu.:3.000
## Median :2.000 Median :45.00 Median : 4.000 Median :3.000
## Mean :1.585 Mean :46.21 Mean : 4.709 Mean :3.521
## 3rd Qu.:2.000 3rd Qu.:59.00 3rd Qu.: 7.000 3rd Qu.:4.000
## Max. :2.000 Max. :97.00 Max. :99.000 Max. :7.000
## NA's :48
## cond_act p1 p2 p3_1
## Min. : 1.000 Min. :1.000 Min. :0.000 Min. :0.000
## 1st Qu.: 1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median : 1.000 Median :1.000 Median :1.000 Median :2.000
## Mean : 3.269 Mean :1.019 Mean :1.419 Mean :1.565
## 3rd Qu.: 7.000 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :10.000 Max. :2.000 Max. :2.000 Max. :2.000
##
## p3_2 p3_3 p3_4 p3_5
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.00
## Median :2.000 Median :2.000 Median :2.000 Median :2.00
## Mean :1.742 Mean :1.784 Mean :1.909 Mean :1.61
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.00
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.00
##
## p4 p5 p5_6esp p6_1
## Min. : 0.000 Min. :0.00 Length:1996 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.00 Class :character 1st Qu.:0.0000
## Median : 0.000 Median :0.00 Mode :character Median :0.0000
## Mean : 1.381 Mean :1.34 Mean :0.7665
## 3rd Qu.: 2.000 3rd Qu.:3.00 3rd Qu.:2.0000
## Max. :80.000 Max. :6.00 Max. :2.0000
##
## p6_2 p6_3 p6_4 p6_5
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.6784 Mean :0.6759 Mean :0.6473 Mean :0.6979
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:2.0000
## Max. :2.0000 Max. :2.0000 Max. :2.0000 Max. :2.0000
##
## p6_6 p6_6esp p7 p7_3
## Min. :0.0000 Length:1996 Min. :0.0000 Min. : 0
## 1st Qu.:0.0000 Class :character 1st Qu.:0.0000 1st Qu.: 0
## Median :0.0000 Mode :character Median :0.0000 Median : 0
## Mean :0.7956 Mean :0.8682 Mean : 2137
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.: 0
## Max. :2.0000 Max. :3.0000 Max. :999999
##
## p8_1 p8_2 p9 p9_5esp
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Length:1996
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 Class :character
## Median :0.0000 Median :0.0000 Median :0.0000 Mode :character
## Mean :0.6904 Mean :0.4749 Mean :0.4594
## 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :2.0000 Max. :2.0000 Max. :5.0000
##
## p10 p11 p11_6esp p12_1
## Min. : 0.000 Min. :0.0000 Length:1996 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 Class :character 1st Qu.:0.0000
## Median : 0.000 Median :0.0000 Mode :character Median :0.0000
## Mean : 0.756 Mean :0.7771 Mean :0.4254
## 3rd Qu.: 0.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :70.000 Max. :6.0000 Max. :2.0000
##
## p12_2 p12_3 p12_4 p12_5
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.4088 Mean :0.4013 Mean :0.4018 Mean :0.4143
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :2.0000 Max. :2.0000 Max. :2.0000 Max. :2.0000
##
## p12_6 p12_7 p12_8 p12_9
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.4008 Mean :0.3978 Mean :0.3798 Mean :0.4409
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :2.0000 Max. :2.0000 Max. :2.0000 Max. :2.0000
##
## p12_9esp p13 p13_3 p14_1
## Length:1996 Min. :0.0000 Min. : 0.00 Min. :0.0000
## Class :character 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.:0.0000
## Mode :character Median :0.0000 Median : 0.00 Median :0.0000
## Mean :0.4755 Mean : 12.73 Mean :0.3988
## 3rd Qu.:0.0000 3rd Qu.: 0.00 3rd Qu.:0.0000
## Max. :3.0000 Max. :3500.00 Max. :2.0000
##
## p14_2 p15 p15_5esp p16
## Min. :0.000 Min. :0.0000 Length:1996 Min. : 0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 Class :character 1st Qu.: 0.0000
## Median :0.000 Median :0.0000 Mode :character Median : 0.0000
## Mean :0.256 Mean :0.2956 Mean : 0.5792
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.: 0.0000
## Max. :2.000 Max. :5.0000 Max. :50.0000
##
## p17 p17_6esp p18_1 p18_2
## Min. :0.0000 Length:1996 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 Class :character 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Mode :character Median :0.0000 Median :0.0000
## Mean :0.5842 Mean :0.2244 Mean :0.2355
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :6.0000 Max. :2.0000 Max. :2.0000
##
## p18_3 p18_4 p18_5 p19
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.000 Median :0.0000 Median :0.0000
## Mean :0.235 Mean :0.254 Mean :0.2129 Mean :0.4228
## 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :2.000 Max. :2.000 Max. :2.0000 Max. :3.0000
##
## p19_3 p20_1 p20_2 p21
## Min. : 0 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0 Median :0.0000 Median :0.0000 Median :0.0000
## Mean : 1006 Mean :0.3307 Mean :0.2039 Mean :0.2525
## 3rd Qu.: 0 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :999999 Max. :2.0000 Max. :2.0000 Max. :5.0000
##
## p21_5esp p22 p23_1 p23_2
## Length:1996 Min. :0.0000 Min. :0.00000 Min. :0.00000
## Class :character 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Mode :character Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1658 Mean :0.07415 Mean :0.08417
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :5.0000 Max. :2.00000 Max. :2.00000
##
## p24 p25 p25_6esp p26
## Min. :0.0000 Min. :0.000 Length:1996 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.:0.000 Class :character 1st Qu.: 0.00
## Median :0.0000 Median :0.000 Mode :character Median : 20.00
## Mean :0.8091 Mean :1.053 Mean : 27.44
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.: 30.00
## Max. :5.0000 Max. :6.000 Max. :480.00
##
## p27 p28 p28_7esp p29 p30
## Min. :0.000 Min. :0.0000 Length:1996 Min. :0.00 Min. :0
## 1st Qu.:0.000 1st Qu.:0.0000 Class :character 1st Qu.:0.00 1st Qu.:0
## Median :2.000 Median :0.0000 Mode :character Median :2.00 Median :3
## Mean :1.195 Mean :0.6012 Mean :1.62 Mean :2
## 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.:3.00 3rd Qu.:3
## Max. :2.000 Max. :7.0000 Max. :4.00 Max. :4
##
## p31 p32 p32_6esp p33_1
## Min. :0.000 Min. :0.0000 Length:1996 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 Class :character 1st Qu.:2.000
## Median :1.000 Median :0.0000 Mode :character Median :2.000
## Mean :1.034 Mean :0.8888 Mean :1.804
## 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:2.000
## Max. :2.000 Max. :6.0000 Max. :2.000
##
## p33_2 p33_3 p33_4 p34_1
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
## Median :2.000 Median :2.000 Median :2.000 Median :2.000
## Mean :1.868 Mean :1.902 Mean :1.843 Mean :1.691
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :3.000
##
## p34_2 p34_3 p34_3_1 p34_4
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:1.000
## Median :1.000 Median :2.000 Median :0.0000 Median :1.000
## Mean :1.487 Mean :1.649 Mean :0.5616 Mean :1.432
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:2.000
## Max. :3.000 Max. :3.000 Max. :3.0000 Max. :3.000
##
## p34_4_1 p35 p36_1 p36_2
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:1.0000 1st Qu.:1.000 1st Qu.:1.000
## Median :1.000 Median :1.0000 Median :1.000 Median :1.000
## Mean :0.999 Mean :0.9895 Mean :1.421 Mean :1.228
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.000
## Max. :6.000 Max. :2.0000 Max. :3.000 Max. :3.000
##
## p36_3 p36_4 factor h_lec
## Min. :0.000 Min. :0.000 Min. : 1634 Min. :0.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 12854 1st Qu.:1.000
## Median :1.000 Median :1.000 Median : 18528 Median :1.000
## Mean :1.337 Mean :1.444 Mean : 21120 Mean :2.182
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.: 26649 3rd Qu.:4.000
## Max. :3.000 Max. :3.000 Max. :105760 Max. :4.000
##
## mat_lec perslec l_format r_format
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :3.000 Median :1.000 Median :0.0000 Median :0.0000
## Mean :2.752 Mean :1.307 Mean :0.7495 Mean :0.4148
## 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:0.0000
## Max. :4.000 Max. :2.000 Max. :3.0000 Max. :3.0000
##
## p_format perslecl
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:1.000
## Median :0.0000 Median :1.000
## Mean :0.3347 Mean :1.398
## 3rd Qu.:0.0000 3rd Qu.:2.000
## Max. :3.0000 Max. :2.000
##
Aunque en la mayor parte de las variables, las medidas calculadas (media, mediana, mínimo, máximo), en muchas no tienen sentido porque son categóricas. El summary permite ver si la variable tiene datos faltantes (NA’s).
Selección de las variables
Se mantendrá la selección de las columnas 3 a 34 del código original, aplicándolas a las nuevas bases de datos:
# Selección de las columnas 3 a 34 para 2023
M23selec = M23[3:34]
# Selección de las columnas 3 a 34 para 2024
M24selec = M24[3:34]
Datos perdidos o faltantes (2024) Visualizo si las variables de interés en la base de 2024 tienen datos perdidos:
Code snippet
summary(M24selec)
## control viv_sel num_hog hog_mud n_ren_el
## Min. :40001 Min. :1.000 Min. :1 Min. :0.00000 Min. :1.000
## 1st Qu.:40095 1st Qu.:1.750 1st Qu.:1 1st Qu.:0.00000 1st Qu.:1.000
## Median :40191 Median :2.000 Median :1 Median :0.00000 Median :1.000
## Mean :40244 Mean :2.493 Mean :1 Mean :0.03472 Mean :1.673
## 3rd Qu.:40315 3rd Qu.:3.000 3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.:2.000
## Max. :41419 Max. :4.000 Max. :1 Max. :2.00000 Max. :8.000
##
## cd periodo sexo edad nivel
## Min. : 1.00 Min. :224 Min. :1.000 Min. :18.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.:224 1st Qu.:1.000 1st Qu.:33.00 1st Qu.: 3.00
## Median : 9.00 Median :224 Median :2.000 Median :46.00 Median : 4.00
## Mean :13.74 Mean :224 Mean :1.574 Mean :46.49 Mean : 4.66
## 3rd Qu.:25.00 3rd Qu.:224 3rd Qu.:2.000 3rd Qu.:59.00 3rd Qu.: 7.00
## Max. :43.00 Max. :224 Max. :2.000 Max. :94.00 Max. :99.00
##
## anio cond_act p1 p2
## Min. :1.000 Min. : 1.000 Min. :1.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.: 1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :3.000 Median : 1.000 Median :1.000 Median :1.000
## Mean :3.456 Mean : 3.325 Mean :1.023 Mean :1.405
## 3rd Qu.:4.000 3rd Qu.: 7.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :9.000 Max. :10.000 Max. :2.000 Max. :2.000
## NA's :48
## p3_1 p3_2 p3_3 p3_4
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :2.000 Median :2.000 Median :2.000
## Mean :1.552 Mean :1.745 Mean :1.781 Mean :1.912
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
##
## p3_5 p4 p5 p5_6esp
## Min. :0.000 Min. : 0.000 Min. :0.000 Length:2016
## 1st Qu.:1.000 1st Qu.: 0.000 1st Qu.:0.000 Class :character
## Median :2.000 Median : 0.000 Median :0.000 Mode :character
## Mean :1.588 Mean : 1.275 Mean :1.348
## 3rd Qu.:2.000 3rd Qu.: 2.000 3rd Qu.:3.000
## Max. :2.000 Max. :70.000 Max. :6.000
##
## p6_1 p6_2 p6_3 p6_4
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.000 Median :0.0000 Median :0.0000
## Mean :0.7723 Mean :0.686 Mean :0.6756 Mean :0.6458
## 3rd Qu.:2.0000 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:1.0000
## Max. :2.0000 Max. :2.000 Max. :2.0000 Max. :2.0000
##
## p6_5 p6_6 p6_6esp p7
## Min. :0.0000 Min. :0.0000 Length:2016 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 Class :character 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Mode :character Median :0.0000
## Mean :0.7143 Mean :0.8006 Mean :0.8591
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:2.0000
## Max. :2.0000 Max. :2.0000 Max. :3.0000
##
## p7_3 p8_1
## Min. : 0 Min. :0.0000
## 1st Qu.: 0 1st Qu.:0.0000
## Median : 0 Median :0.0000
## Mean : 3115 Mean :0.7054
## 3rd Qu.: 0 3rd Qu.:2.0000
## Max. :999999 Max. :2.0000
##
La variable anio (si está en la columna 10 de la selección, lo verificamos con el summary) o cualquier otra que muestre NA’s, tiene datos faltantes.
Para calcular el porcentaje de datos faltantes en la columna 10 de la selección (anio):
# Verifica la columna 10 (la décima variable seleccionada) en la base de 2024
sum(is.na(M24selec[ ,11])) / length(M24selec[ ,11])
## [1] 0.02380952
Para quitar los NA’s (2024) Aplicamos la función na.omit() para eliminar las filas con datos faltantes en la selección de 2024:
M24selec = na.omit(M24selec)
summary(M24selec)
## control viv_sel num_hog hog_mud n_ren_el
## Min. :40001 Min. :1.000 Min. :1 Min. :0.00000 Min. :1.000
## 1st Qu.:40095 1st Qu.:2.000 1st Qu.:1 1st Qu.:0.00000 1st Qu.:1.000
## Median :40191 Median :3.000 Median :1 Median :0.00000 Median :1.000
## Mean :40243 Mean :2.503 Mean :1 Mean :0.03506 Mean :1.678
## 3rd Qu.:40315 3rd Qu.:3.000 3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.:2.000
## Max. :41419 Max. :4.000 Max. :1 Max. :2.00000 Max. :8.000
## cd periodo sexo edad nivel
## Min. : 1.00 Min. :224 Min. :1.00 Min. :18.00 Min. :2.000
## 1st Qu.: 2.00 1st Qu.:224 1st Qu.:1.00 1st Qu.:32.00 1st Qu.:3.000
## Median : 9.00 Median :224 Median :2.00 Median :45.00 Median :4.000
## Mean :13.67 Mean :224 Mean :1.57 Mean :46.02 Mean :4.622
## 3rd Qu.:25.00 3rd Qu.:224 3rd Qu.:2.00 3rd Qu.:58.25 3rd Qu.:7.000
## Max. :43.00 Max. :224 Max. :2.00 Max. :94.00 Max. :9.000
## anio cond_act p1 p2 p3_1
## Min. :1.000 Min. : 1.00 Min. :1.000 Min. :0.00 Min. :0.00
## 1st Qu.:3.000 1st Qu.: 1.00 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.00
## Median :3.000 Median : 1.00 Median :1.000 Median :1.00 Median :2.00
## Mean :3.456 Mean : 3.29 Mean :1.011 Mean :1.42 Mean :1.57
## 3rd Qu.:4.000 3rd Qu.: 7.00 3rd Qu.:1.000 3rd Qu.:2.00 3rd Qu.:2.00
## Max. :9.000 Max. :10.00 Max. :2.000 Max. :2.00 Max. :2.00
## p3_2 p3_3 p3_4 p3_5
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
## Median :2.000 Median :2.000 Median :2.000 Median :2.000
## Mean :1.765 Mean :1.803 Mean :1.936 Mean :1.605
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
## p4 p5 p5_6esp p6_1
## Min. : 0.000 Min. :0.000 Length:1968 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.000 Class :character 1st Qu.:0.0000
## Median : 0.000 Median :0.000 Mode :character Median :0.0000
## Mean : 1.303 Mean :1.371 Mean :0.7871
## 3rd Qu.: 2.000 3rd Qu.:3.000 3rd Qu.:2.0000
## Max. :70.000 Max. :6.000 Max. :2.0000
## p6_2 p6_3 p6_4 p6_5
## Min. :0.0000 Min. :0.00 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.00 Median :0.0000 Median :0.0000
## Mean :0.6987 Mean :0.69 Mean :0.6575 Mean :0.7276
## 3rd Qu.:2.0000 3rd Qu.:2.00 3rd Qu.:1.0000 3rd Qu.:2.0000
## Max. :2.0000 Max. :2.00 Max. :2.0000 Max. :2.0000
## p6_6 p6_6esp p7 p7_3
## Min. :0.0000 Length:1968 Min. :0.0000 Min. : 0
## 1st Qu.:0.0000 Class :character 1st Qu.:0.0000 1st Qu.: 0
## Median :0.0000 Mode :character Median :0.0000 Median : 0
## Mean :0.8161 Mean :0.8755 Mean : 3191
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.: 0
## Max. :2.0000 Max. :3.0000 Max. :999999
## p8_1
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.7185
## 3rd Qu.:2.0000
## Max. :2.0000
Se verifica que ya no hay datos perdidos en el data set de interés para 2024. (Se recomienda repetir este proceso para M23selec si es necesario).
Datos perdidos o faltantes 2023 Visualizo si las variables de interés en la base de 2023 tienen datos perdidos
summary(M23selec)
## control viv_sel num_hog hog_mud n_ren_el
## Min. :40001 Min. :1.000 Min. :1 Min. :0.00000 Min. : 1.000
## 1st Qu.:40100 1st Qu.:2.000 1st Qu.:1 1st Qu.:0.00000 1st Qu.: 1.000
## Median :40199 Median :2.000 Median :1 Median :0.00000 Median : 1.000
## Mean :40253 Mean :2.486 Mean :1 Mean :0.05561 Mean : 1.667
## 3rd Qu.:40335 3rd Qu.:3.000 3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.: 2.000
## Max. :41398 Max. :4.000 Max. :1 Max. :3.00000 Max. :10.000
##
## cd periodo sexo edad nivel
## Min. : 1.00 Min. :223 Min. :1.000 Min. :18.00 Min. : 0.000
## 1st Qu.: 2.00 1st Qu.:223 1st Qu.:1.000 1st Qu.:32.00 1st Qu.: 3.000
## Median : 9.00 Median :223 Median :2.000 Median :45.00 Median : 4.000
## Mean :13.85 Mean :223 Mean :1.585 Mean :46.21 Mean : 4.709
## 3rd Qu.:25.00 3rd Qu.:223 3rd Qu.:2.000 3rd Qu.:59.00 3rd Qu.: 7.000
## Max. :43.00 Max. :223 Max. :2.000 Max. :97.00 Max. :99.000
##
## anio cond_act p1 p2
## Min. :1.000 Min. : 1.000 Min. :1.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.: 1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :3.000 Median : 1.000 Median :1.000 Median :1.000
## Mean :3.521 Mean : 3.269 Mean :1.019 Mean :1.419
## 3rd Qu.:4.000 3rd Qu.: 7.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :7.000 Max. :10.000 Max. :2.000 Max. :2.000
## NA's :48
## p3_1 p3_2 p3_3 p3_4 p3_5
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.00
## Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.00
## Mean :1.565 Mean :1.742 Mean :1.784 Mean :1.909 Mean :1.61
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.00
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.00
##
## p4 p5 p5_6esp p6_1
## Min. : 0.000 Min. :0.00 Length:1996 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.00 Class :character 1st Qu.:0.0000
## Median : 0.000 Median :0.00 Mode :character Median :0.0000
## Mean : 1.381 Mean :1.34 Mean :0.7665
## 3rd Qu.: 2.000 3rd Qu.:3.00 3rd Qu.:2.0000
## Max. :80.000 Max. :6.00 Max. :2.0000
##
## p6_2 p6_3 p6_4 p6_5
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.6784 Mean :0.6759 Mean :0.6473 Mean :0.6979
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:2.0000
## Max. :2.0000 Max. :2.0000 Max. :2.0000 Max. :2.0000
##
## p6_6 p6_6esp p7 p7_3
## Min. :0.0000 Length:1996 Min. :0.0000 Min. : 0
## 1st Qu.:0.0000 Class :character 1st Qu.:0.0000 1st Qu.: 0
## Median :0.0000 Mode :character Median :0.0000 Median : 0
## Mean :0.7956 Mean :0.8682 Mean : 2137
## 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.: 0
## Max. :2.0000 Max. :3.0000 Max. :999999
##
## p8_1
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.6904
## 3rd Qu.:2.0000
## Max. :2.0000
##
Para calcular el porcentaje de datos faltantes en la columna 10 de la selección (anio):
# Verifica la columna 10 (la décima variable seleccionada) en la base de 2023
sum(is.na(M23selec[ ,11])) / length(M23selec[ ,11])
## [1] 0.0240481
Aplicamos la función na.omit() para eliminar las filas con datos faltantes en la selección de 2023:
M23selec = na.omit(M23selec)
summary(M23selec)
## control viv_sel num_hog hog_mud n_ren_el
## Min. :40001 Min. :1.000 Min. :1 Min. :0.00000 Min. : 1.000
## 1st Qu.:40101 1st Qu.:2.000 1st Qu.:1 1st Qu.:0.00000 1st Qu.: 1.000
## Median :40199 Median :2.000 Median :1 Median :0.00000 Median : 1.000
## Mean :40250 Mean :2.495 Mean :1 Mean :0.05339 Mean : 1.669
## 3rd Qu.:40333 3rd Qu.:3.000 3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.: 2.000
## Max. :41398 Max. :4.000 Max. :1 Max. :2.00000 Max. :10.000
## cd periodo sexo edad nivel
## Min. : 1.00 Min. :223 Min. :1.000 Min. :18.00 Min. :2.000
## 1st Qu.: 2.00 1st Qu.:223 1st Qu.:1.000 1st Qu.:32.00 1st Qu.:3.000
## Median : 9.00 Median :223 Median :2.000 Median :45.00 Median :4.000
## Mean :13.77 Mean :223 Mean :1.583 Mean :45.82 Mean :4.571
## 3rd Qu.:25.00 3rd Qu.:223 3rd Qu.:2.000 3rd Qu.:58.00 3rd Qu.:7.000
## Max. :43.00 Max. :223 Max. :2.000 Max. :97.00 Max. :9.000
## anio cond_act p1 p2
## Min. :1.000 Min. : 1.000 Min. :1.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.: 1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :3.000 Median : 1.000 Median :1.000 Median :1.000
## Mean :3.521 Mean : 3.244 Mean :1.005 Mean :1.437
## 3rd Qu.:4.000 3rd Qu.: 7.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :7.000 Max. :10.000 Max. :2.000 Max. :2.000
## p3_1 p3_2 p3_3 p3_4 p3_5
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.00
## Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.00
## Mean :1.585 Mean :1.765 Mean :1.808 Mean :1.936 Mean :1.63
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.00
## Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.00
## p4 p5 p5_6esp p6_1
## Min. : 0.000 Min. :0.000 Length:1948 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.000 Class :character 1st Qu.:0.0000
## Median : 0.000 Median :0.000 Mode :character Median :0.0000
## Mean : 1.412 Mean :1.363 Mean :0.7813
## 3rd Qu.: 2.000 3rd Qu.:3.000 3rd Qu.:2.0000
## Max. :80.000 Max. :6.000 Max. :2.0000
## p6_2 p6_3 p6_4 p6_5
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.000
## Mean :0.691 Mean :0.6905 Mean :0.6591 Mean :0.711
## 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:2.000
## Max. :2.000 Max. :2.0000 Max. :2.0000 Max. :2.000
## p6_6 p6_6esp p7 p7_3
## Min. :0.0000 Length:1948 Min. :0.000 Min. : 0
## 1st Qu.:0.0000 Class :character 1st Qu.:0.000 1st Qu.: 0
## Median :0.0000 Mode :character Median :0.000 Median : 0
## Mean :0.8111 Mean :0.885 Mean : 2190
## 3rd Qu.:2.0000 3rd Qu.:2.000 3rd Qu.: 0
## Max. :2.0000 Max. :3.000 Max. :999999
## p8_1
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.7033
## 3rd Qu.:2.0000
## Max. :2.0000