Exploración de las bases de datos (2023 y 2024)

# Carga de la base de datos de 2023
M23 = read.csv("Datos_molec_2023-1.csv")
# Carga de la base de datos de 2024
M24 = read.csv("Datos_molec_2024-1.csv")

# Estructura de la base de 2023 para la exploración inicial
str(M23)
## 'data.frame':    1996 obs. of  108 variables:
##  $ folio   : chr  "11B213" "11B213" "11B213" "11B213" ...
##  $ entidad : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ control : int  40003 40003 40003 40003 40007 40007 40007 40007 40034 40034 ...
##  $ viv_sel : int  2 1 4 3 4 3 2 1 4 1 ...
##  $ num_hog : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ hog_mud : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ n_ren_el: int  1 1 1 2 1 2 2 1 2 2 ...
##  $ cd      : int  14 14 14 14 14 14 14 14 14 14 ...
##  $ periodo : int  223 223 223 223 223 223 223 223 223 223 ...
##  $ sexo    : int  1 1 2 2 1 2 2 1 2 2 ...
##  $ edad    : int  29 53 65 44 52 36 74 53 23 57 ...
##  $ nivel   : int  4 3 2 3 6 3 2 3 7 4 ...
##  $ anio    : int  3 3 6 3 3 3 6 3 4 3 ...
##  $ cond_act: int  1 1 7 7 1 1 7 1 1 1 ...
##  $ p1      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ p2      : int  2 2 2 2 2 2 1 2 1 2 ...
##  $ p3_1    : int  2 2 2 2 2 2 1 2 1 2 ...
##  $ p3_2    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ p3_3    : int  2 1 2 2 2 2 2 2 2 2 ...
##  $ p3_4    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ p3_5    : int  2 2 2 2 1 2 2 1 1 2 ...
##  $ p4      : int  0 0 0 0 0 0 2 0 3 0 ...
##  $ p5      : int  0 0 0 0 0 0 4 0 2 0 ...
##  $ p5_6esp : chr  "" "" "" "" ...
##  $ p6_1    : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p6_2    : int  0 0 0 0 0 0 2 0 1 0 ...
##  $ p6_3    : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p6_4    : int  0 0 0 0 0 0 1 0 2 0 ...
##  $ p6_5    : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p6_6    : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p6_6esp : chr  "" "" "" "" ...
##  $ p7      : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p7_3    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p8_1    : int  0 0 0 0 0 0 2 0 2 0 ...
##  $ p8_2    : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ p9      : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ p9_5esp : chr  "" "" "" "" ...
##  $ p10     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p11     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p11_6esp: chr  "" "" "" "" ...
##  $ p12_1   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_2   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_3   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_4   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_5   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_6   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_7   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_8   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_9   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p12_9esp: chr  "" "" "" "" ...
##  $ p13     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p13_3   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p14_1   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p14_2   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p15     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p15_5esp: chr  "" "" "" "" ...
##  $ p16     : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p17     : int  0 4 0 0 0 0 0 0 0 0 ...
##  $ p17_6esp: chr  "" "" "" "" ...
##  $ p18_1   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p18_2   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p18_3   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p18_4   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p18_5   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p19     : int  0 2 0 0 0 0 0 0 0 0 ...
##  $ p19_3   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p20_1   : int  0 2 0 0 0 0 0 0 0 0 ...
##  $ p20_2   : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ p21     : int  0 4 0 0 0 0 0 0 0 0 ...
##  $ p21_5esp: chr  "" "" "" "" ...
##  $ p22     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p23_1   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p23_2   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p24     : int  0 0 0 0 2 0 0 5 2 0 ...
##  $ p25     : int  0 0 0 0 1 0 0 4 2 0 ...
##  $ p25_6esp: chr  "" "" "" "" ...
##  $ p26     : int  0 10 0 0 20 0 30 15 30 0 ...
##  $ p27     : int  0 2 0 0 2 0 2 2 2 0 ...
##  $ p28     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ p28_7esp: chr  "" "" "" "" ...
##  $ p29     : int  0 2 0 0 2 0 2 2 2 0 ...
##  $ p30     : int  0 2 0 0 3 0 3 3 3 0 ...
##  $ p31     : int  0 2 0 0 1 0 2 2 1 0 ...
##  $ p32     : int  3 0 1 2 0 3 0 0 0 2 ...
##  $ p32_6esp: chr  "" "" "" "" ...
##  $ p33_1   : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ p33_2   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ p33_3   : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ p33_4   : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ p34_1   : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ p34_2   : int  2 1 2 2 1 2 1 1 1 2 ...
##  $ p34_3   : int  2 2 2 2 1 2 2 1 1 2 ...
##  $ p34_3_1 : int  0 0 0 0 2 0 0 1 2 0 ...
##  $ p34_4   : int  1 1 1 1 1 1 1 1 1 2 ...
##  $ p34_4_1 : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ p35     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ p36_1   : int  1 2 2 1 1 1 2 1 1 1 ...
##  $ p36_2   : int  1 1 2 1 1 1 2 1 1 1 ...
##  $ p36_3   : int  1 2 2 1 1 1 2 1 1 1 ...
##   [list output truncated]

La mayoría de las variables son cualitativas, pero están insertas como números. Se tiene que revisar en el contexto del problema cuáles son realmente numéricas.

Diccionarios

# Carga del diccionario de 2023 
Dicc23 = read.csv("Diccionario_molec_2023-1.csv")
head(Dicc23)
##             NOMBRE_CAMPO TIPO NEMÓNICO CATÁLOGO
## 1                 Folio     C    folio         
## 2               Entidad     C  entidad entidad 
## 3               Control     C  control         
## 4 Vivienda seleccionada     C  viv_sel viv_sel 
## 5       Número de hogar     C  num_hog  num_hog
## 6          hogar mudado     C  hog_mud hog_mud 
##                                         RANGO_CLAVES
## 1 1 Zona, 1...3 Estrato, Últimos 4 panel de rotación
## 2                                            01...32
## 3                                      40001...41398
## 4                                              1...9
## 5                                              1...9
## 6                                              0...3

El diccionario indica qué significan las variables en el contexto del problema, pero no a qué equivalen sus categorías. Eso se debe leer en el archivo de descriptor de las preguntas y en el cuestionario.

Resumen de las variables (2023)

summary(M23)
##     folio              entidad         control         viv_sel         num_hog 
##  Length:1996        Min.   : 1.00   Min.   :40001   Min.   :1.000   Min.   :1  
##  Class :character   1st Qu.: 9.00   1st Qu.:40100   1st Qu.:2.000   1st Qu.:1  
##  Mode  :character   Median :15.00   Median :40199   Median :2.000   Median :1  
##                     Mean   :15.61   Mean   :40253   Mean   :2.486   Mean   :1  
##                     3rd Qu.:21.00   3rd Qu.:40335   3rd Qu.:3.000   3rd Qu.:1  
##                     Max.   :32.00   Max.   :41398   Max.   :4.000   Max.   :1  
##                                                                                
##     hog_mud           n_ren_el            cd           periodo   
##  Min.   :0.00000   Min.   : 1.000   Min.   : 1.00   Min.   :223  
##  1st Qu.:0.00000   1st Qu.: 1.000   1st Qu.: 2.00   1st Qu.:223  
##  Median :0.00000   Median : 1.000   Median : 9.00   Median :223  
##  Mean   :0.05561   Mean   : 1.667   Mean   :13.85   Mean   :223  
##  3rd Qu.:0.00000   3rd Qu.: 2.000   3rd Qu.:25.00   3rd Qu.:223  
##  Max.   :3.00000   Max.   :10.000   Max.   :43.00   Max.   :223  
##                                                                  
##       sexo            edad           nivel             anio      
##  Min.   :1.000   Min.   :18.00   Min.   : 0.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:32.00   1st Qu.: 3.000   1st Qu.:3.000  
##  Median :2.000   Median :45.00   Median : 4.000   Median :3.000  
##  Mean   :1.585   Mean   :46.21   Mean   : 4.709   Mean   :3.521  
##  3rd Qu.:2.000   3rd Qu.:59.00   3rd Qu.: 7.000   3rd Qu.:4.000  
##  Max.   :2.000   Max.   :97.00   Max.   :99.000   Max.   :7.000  
##                                                   NA's   :48     
##     cond_act            p1              p2             p3_1      
##  Min.   : 1.000   Min.   :1.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median : 1.000   Median :1.000   Median :1.000   Median :2.000  
##  Mean   : 3.269   Mean   :1.019   Mean   :1.419   Mean   :1.565  
##  3rd Qu.: 7.000   3rd Qu.:1.000   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :10.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
##                                                                  
##       p3_2            p3_3            p3_4            p3_5     
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.00  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.00  
##  Mean   :1.742   Mean   :1.784   Mean   :1.909   Mean   :1.61  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.00  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.00  
##                                                                
##        p4               p5         p5_6esp               p6_1       
##  Min.   : 0.000   Min.   :0.00   Length:1996        Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.00   Class :character   1st Qu.:0.0000  
##  Median : 0.000   Median :0.00   Mode  :character   Median :0.0000  
##  Mean   : 1.381   Mean   :1.34                      Mean   :0.7665  
##  3rd Qu.: 2.000   3rd Qu.:3.00                      3rd Qu.:2.0000  
##  Max.   :80.000   Max.   :6.00                      Max.   :2.0000  
##                                                                     
##       p6_2             p6_3             p6_4             p6_5       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.6784   Mean   :0.6759   Mean   :0.6473   Mean   :0.6979  
##  3rd Qu.:2.0000   3rd Qu.:2.0000   3rd Qu.:1.0000   3rd Qu.:2.0000  
##  Max.   :2.0000   Max.   :2.0000   Max.   :2.0000   Max.   :2.0000  
##                                                                     
##       p6_6          p6_6esp                p7              p7_3       
##  Min.   :0.0000   Length:1996        Min.   :0.0000   Min.   :     0  
##  1st Qu.:0.0000   Class :character   1st Qu.:0.0000   1st Qu.:     0  
##  Median :0.0000   Mode  :character   Median :0.0000   Median :     0  
##  Mean   :0.7956                      Mean   :0.8682   Mean   :  2137  
##  3rd Qu.:2.0000                      3rd Qu.:2.0000   3rd Qu.:     0  
##  Max.   :2.0000                      Max.   :3.0000   Max.   :999999  
##                                                                       
##       p8_1             p8_2              p9           p9_5esp         
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Length:1996       
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   Class :character  
##  Median :0.0000   Median :0.0000   Median :0.0000   Mode  :character  
##  Mean   :0.6904   Mean   :0.4749   Mean   :0.4594                     
##  3rd Qu.:2.0000   3rd Qu.:1.0000   3rd Qu.:1.0000                     
##  Max.   :2.0000   Max.   :2.0000   Max.   :5.0000                     
##                                                                       
##       p10              p11           p11_6esp             p12_1       
##  Min.   : 0.000   Min.   :0.0000   Length:1996        Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.0000   Class :character   1st Qu.:0.0000  
##  Median : 0.000   Median :0.0000   Mode  :character   Median :0.0000  
##  Mean   : 0.756   Mean   :0.7771                      Mean   :0.4254  
##  3rd Qu.: 0.000   3rd Qu.:0.0000                      3rd Qu.:0.0000  
##  Max.   :70.000   Max.   :6.0000                      Max.   :2.0000  
##                                                                       
##      p12_2            p12_3            p12_4            p12_5       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.4088   Mean   :0.4013   Mean   :0.4018   Mean   :0.4143  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :2.0000   Max.   :2.0000   Max.   :2.0000   Max.   :2.0000  
##                                                                     
##      p12_6            p12_7            p12_8            p12_9       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.4008   Mean   :0.3978   Mean   :0.3798   Mean   :0.4409  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :2.0000   Max.   :2.0000   Max.   :2.0000   Max.   :2.0000  
##                                                                     
##    p12_9esp              p13             p13_3             p14_1       
##  Length:1996        Min.   :0.0000   Min.   :   0.00   Min.   :0.0000  
##  Class :character   1st Qu.:0.0000   1st Qu.:   0.00   1st Qu.:0.0000  
##  Mode  :character   Median :0.0000   Median :   0.00   Median :0.0000  
##                     Mean   :0.4755   Mean   :  12.73   Mean   :0.3988  
##                     3rd Qu.:0.0000   3rd Qu.:   0.00   3rd Qu.:0.0000  
##                     Max.   :3.0000   Max.   :3500.00   Max.   :2.0000  
##                                                                        
##      p14_2            p15           p15_5esp              p16         
##  Min.   :0.000   Min.   :0.0000   Length:1996        Min.   : 0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   Class :character   1st Qu.: 0.0000  
##  Median :0.000   Median :0.0000   Mode  :character   Median : 0.0000  
##  Mean   :0.256   Mean   :0.2956                      Mean   : 0.5792  
##  3rd Qu.:0.000   3rd Qu.:0.0000                      3rd Qu.: 0.0000  
##  Max.   :2.000   Max.   :5.0000                      Max.   :50.0000  
##                                                                       
##       p17           p17_6esp             p18_1            p18_2       
##  Min.   :0.0000   Length:1996        Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   Class :character   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Mode  :character   Median :0.0000   Median :0.0000  
##  Mean   :0.5842                      Mean   :0.2244   Mean   :0.2355  
##  3rd Qu.:0.0000                      3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :6.0000                      Max.   :2.0000   Max.   :2.0000  
##                                                                       
##      p18_3           p18_4           p18_5             p19        
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :0.000   Median :0.0000   Median :0.0000  
##  Mean   :0.235   Mean   :0.254   Mean   :0.2129   Mean   :0.4228  
##  3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :2.000   Max.   :2.000   Max.   :2.0000   Max.   :3.0000  
##                                                                   
##      p19_3            p20_1            p20_2             p21        
##  Min.   :     0   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:     0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :     0   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :  1006   Mean   :0.3307   Mean   :0.2039   Mean   :0.2525  
##  3rd Qu.:     0   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :999999   Max.   :2.0000   Max.   :2.0000   Max.   :5.0000  
##                                                                     
##    p21_5esp              p22             p23_1             p23_2        
##  Length:1996        Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  Class :character   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Mode  :character   Median :0.0000   Median :0.00000   Median :0.00000  
##                     Mean   :0.1658   Mean   :0.07415   Mean   :0.08417  
##                     3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##                     Max.   :5.0000   Max.   :2.00000   Max.   :2.00000  
##                                                                         
##       p24              p25          p25_6esp              p26        
##  Min.   :0.0000   Min.   :0.000   Length:1996        Min.   :  0.00  
##  1st Qu.:0.0000   1st Qu.:0.000   Class :character   1st Qu.:  0.00  
##  Median :0.0000   Median :0.000   Mode  :character   Median : 20.00  
##  Mean   :0.8091   Mean   :1.053                      Mean   : 27.44  
##  3rd Qu.:1.0000   3rd Qu.:3.000                      3rd Qu.: 30.00  
##  Max.   :5.0000   Max.   :6.000                      Max.   :480.00  
##                                                                      
##       p27             p28           p28_7esp              p29            p30   
##  Min.   :0.000   Min.   :0.0000   Length:1996        Min.   :0.00   Min.   :0  
##  1st Qu.:0.000   1st Qu.:0.0000   Class :character   1st Qu.:0.00   1st Qu.:0  
##  Median :2.000   Median :0.0000   Mode  :character   Median :2.00   Median :3  
##  Mean   :1.195   Mean   :0.6012                      Mean   :1.62   Mean   :2  
##  3rd Qu.:2.000   3rd Qu.:0.0000                      3rd Qu.:3.00   3rd Qu.:3  
##  Max.   :2.000   Max.   :7.0000                      Max.   :4.00   Max.   :4  
##                                                                                
##       p31             p32           p32_6esp             p33_1      
##  Min.   :0.000   Min.   :0.0000   Length:1996        Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:0.0000   Class :character   1st Qu.:2.000  
##  Median :1.000   Median :0.0000   Mode  :character   Median :2.000  
##  Mean   :1.034   Mean   :0.8888                      Mean   :1.804  
##  3rd Qu.:2.000   3rd Qu.:1.0000                      3rd Qu.:2.000  
##  Max.   :2.000   Max.   :6.0000                      Max.   :2.000  
##                                                                     
##      p33_2           p33_3           p33_4           p34_1      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :1.868   Mean   :1.902   Mean   :1.843   Mean   :1.691  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :3.000  
##                                                                 
##      p34_2           p34_3          p34_3_1           p34_4      
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:1.000  
##  Median :1.000   Median :2.000   Median :0.0000   Median :1.000  
##  Mean   :1.487   Mean   :1.649   Mean   :0.5616   Mean   :1.432  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:2.000  
##  Max.   :3.000   Max.   :3.000   Max.   :3.0000   Max.   :3.000  
##                                                                  
##     p34_4_1           p35             p36_1           p36_2      
##  Min.   :0.000   Min.   :0.0000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000   Median :1.0000   Median :1.000   Median :1.000  
##  Mean   :0.999   Mean   :0.9895   Mean   :1.421   Mean   :1.228  
##  3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:1.000  
##  Max.   :6.000   Max.   :2.0000   Max.   :3.000   Max.   :3.000  
##                                                                  
##      p36_3           p36_4           factor           h_lec      
##  Min.   :0.000   Min.   :0.000   Min.   :  1634   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 12854   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median : 18528   Median :1.000  
##  Mean   :1.337   Mean   :1.444   Mean   : 21120   Mean   :2.182  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.: 26649   3rd Qu.:4.000  
##  Max.   :3.000   Max.   :3.000   Max.   :105760   Max.   :4.000  
##                                                                  
##     mat_lec         perslec         l_format         r_format     
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :3.000   Median :1.000   Median :0.0000   Median :0.0000  
##  Mean   :2.752   Mean   :1.307   Mean   :0.7495   Mean   :0.4148  
##  3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:2.0000   3rd Qu.:0.0000  
##  Max.   :4.000   Max.   :2.000   Max.   :3.0000   Max.   :3.0000  
##                                                                   
##     p_format         perslecl    
##  Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:1.000  
##  Median :0.0000   Median :1.000  
##  Mean   :0.3347   Mean   :1.398  
##  3rd Qu.:0.0000   3rd Qu.:2.000  
##  Max.   :3.0000   Max.   :2.000  
## 

Aunque en la mayor parte de las variables, las medidas calculadas (media, mediana, mínimo, máximo), en muchas no tienen sentido porque son categóricas. El summary permite ver si la variable tiene datos faltantes (NA’s).

Selección de las variables

Se mantendrá la selección de las columnas 3 a 34 del código original, aplicándolas a las nuevas bases de datos:

# Selección de las columnas 3 a 34 para 2023
M23selec = M23[3:34]
# Selección de las columnas 3 a 34 para 2024
M24selec = M24[3:34]

Datos perdidos o faltantes (2024) Visualizo si las variables de interés en la base de 2024 tienen datos perdidos:

Code snippet

summary(M24selec)
##     control         viv_sel         num_hog     hog_mud           n_ren_el    
##  Min.   :40001   Min.   :1.000   Min.   :1   Min.   :0.00000   Min.   :1.000  
##  1st Qu.:40095   1st Qu.:1.750   1st Qu.:1   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :40191   Median :2.000   Median :1   Median :0.00000   Median :1.000  
##  Mean   :40244   Mean   :2.493   Mean   :1   Mean   :0.03472   Mean   :1.673  
##  3rd Qu.:40315   3rd Qu.:3.000   3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :41419   Max.   :4.000   Max.   :1   Max.   :2.00000   Max.   :8.000  
##                                                                               
##        cd           periodo         sexo            edad           nivel      
##  Min.   : 1.00   Min.   :224   Min.   :1.000   Min.   :18.00   Min.   : 0.00  
##  1st Qu.: 2.00   1st Qu.:224   1st Qu.:1.000   1st Qu.:33.00   1st Qu.: 3.00  
##  Median : 9.00   Median :224   Median :2.000   Median :46.00   Median : 4.00  
##  Mean   :13.74   Mean   :224   Mean   :1.574   Mean   :46.49   Mean   : 4.66  
##  3rd Qu.:25.00   3rd Qu.:224   3rd Qu.:2.000   3rd Qu.:59.00   3rd Qu.: 7.00  
##  Max.   :43.00   Max.   :224   Max.   :2.000   Max.   :94.00   Max.   :99.00  
##                                                                               
##       anio          cond_act            p1              p2       
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median : 1.000   Median :1.000   Median :1.000  
##  Mean   :3.456   Mean   : 3.325   Mean   :1.023   Mean   :1.405  
##  3rd Qu.:4.000   3rd Qu.: 7.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :9.000   Max.   :10.000   Max.   :2.000   Max.   :2.000  
##  NA's   :48                                                      
##       p3_1            p3_2            p3_3            p3_4      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :1.552   Mean   :1.745   Mean   :1.781   Mean   :1.912  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
##                                                                 
##       p3_5             p4               p5          p5_6esp         
##  Min.   :0.000   Min.   : 0.000   Min.   :0.000   Length:2016       
##  1st Qu.:1.000   1st Qu.: 0.000   1st Qu.:0.000   Class :character  
##  Median :2.000   Median : 0.000   Median :0.000   Mode  :character  
##  Mean   :1.588   Mean   : 1.275   Mean   :1.348                     
##  3rd Qu.:2.000   3rd Qu.: 2.000   3rd Qu.:3.000                     
##  Max.   :2.000   Max.   :70.000   Max.   :6.000                     
##                                                                     
##       p6_1             p6_2            p6_3             p6_4       
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.000   Median :0.0000   Median :0.0000  
##  Mean   :0.7723   Mean   :0.686   Mean   :0.6756   Mean   :0.6458  
##  3rd Qu.:2.0000   3rd Qu.:2.000   3rd Qu.:2.0000   3rd Qu.:1.0000  
##  Max.   :2.0000   Max.   :2.000   Max.   :2.0000   Max.   :2.0000  
##                                                                    
##       p6_5             p6_6          p6_6esp                p7        
##  Min.   :0.0000   Min.   :0.0000   Length:2016        Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   Class :character   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Mode  :character   Median :0.0000  
##  Mean   :0.7143   Mean   :0.8006                      Mean   :0.8591  
##  3rd Qu.:2.0000   3rd Qu.:2.0000                      3rd Qu.:2.0000  
##  Max.   :2.0000   Max.   :2.0000                      Max.   :3.0000  
##                                                                       
##       p7_3             p8_1       
##  Min.   :     0   Min.   :0.0000  
##  1st Qu.:     0   1st Qu.:0.0000  
##  Median :     0   Median :0.0000  
##  Mean   :  3115   Mean   :0.7054  
##  3rd Qu.:     0   3rd Qu.:2.0000  
##  Max.   :999999   Max.   :2.0000  
## 

La variable anio (si está en la columna 10 de la selección, lo verificamos con el summary) o cualquier otra que muestre NA’s, tiene datos faltantes.

Para calcular el porcentaje de datos faltantes en la columna 10 de la selección (anio):

# Verifica la columna 10 (la décima variable seleccionada) en la base de 2024
sum(is.na(M24selec[ ,11])) / length(M24selec[ ,11])
## [1] 0.02380952

Para quitar los NA’s (2024) Aplicamos la función na.omit() para eliminar las filas con datos faltantes en la selección de 2024:

M24selec = na.omit(M24selec)
summary(M24selec)
##     control         viv_sel         num_hog     hog_mud           n_ren_el    
##  Min.   :40001   Min.   :1.000   Min.   :1   Min.   :0.00000   Min.   :1.000  
##  1st Qu.:40095   1st Qu.:2.000   1st Qu.:1   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :40191   Median :3.000   Median :1   Median :0.00000   Median :1.000  
##  Mean   :40243   Mean   :2.503   Mean   :1   Mean   :0.03506   Mean   :1.678  
##  3rd Qu.:40315   3rd Qu.:3.000   3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :41419   Max.   :4.000   Max.   :1   Max.   :2.00000   Max.   :8.000  
##        cd           periodo         sexo           edad           nivel      
##  Min.   : 1.00   Min.   :224   Min.   :1.00   Min.   :18.00   Min.   :2.000  
##  1st Qu.: 2.00   1st Qu.:224   1st Qu.:1.00   1st Qu.:32.00   1st Qu.:3.000  
##  Median : 9.00   Median :224   Median :2.00   Median :45.00   Median :4.000  
##  Mean   :13.67   Mean   :224   Mean   :1.57   Mean   :46.02   Mean   :4.622  
##  3rd Qu.:25.00   3rd Qu.:224   3rd Qu.:2.00   3rd Qu.:58.25   3rd Qu.:7.000  
##  Max.   :43.00   Max.   :224   Max.   :2.00   Max.   :94.00   Max.   :9.000  
##       anio          cond_act           p1              p2            p3_1     
##  Min.   :1.000   Min.   : 1.00   Min.   :1.000   Min.   :0.00   Min.   :0.00  
##  1st Qu.:3.000   1st Qu.: 1.00   1st Qu.:1.000   1st Qu.:1.00   1st Qu.:1.00  
##  Median :3.000   Median : 1.00   Median :1.000   Median :1.00   Median :2.00  
##  Mean   :3.456   Mean   : 3.29   Mean   :1.011   Mean   :1.42   Mean   :1.57  
##  3rd Qu.:4.000   3rd Qu.: 7.00   3rd Qu.:1.000   3rd Qu.:2.00   3rd Qu.:2.00  
##  Max.   :9.000   Max.   :10.00   Max.   :2.000   Max.   :2.00   Max.   :2.00  
##       p3_2            p3_3            p3_4            p3_5      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :1.765   Mean   :1.803   Mean   :1.936   Mean   :1.605  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
##        p4               p5          p5_6esp               p6_1       
##  Min.   : 0.000   Min.   :0.000   Length:1968        Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.000   Class :character   1st Qu.:0.0000  
##  Median : 0.000   Median :0.000   Mode  :character   Median :0.0000  
##  Mean   : 1.303   Mean   :1.371                      Mean   :0.7871  
##  3rd Qu.: 2.000   3rd Qu.:3.000                      3rd Qu.:2.0000  
##  Max.   :70.000   Max.   :6.000                      Max.   :2.0000  
##       p6_2             p6_3           p6_4             p6_5       
##  Min.   :0.0000   Min.   :0.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00   Median :0.0000   Median :0.0000  
##  Mean   :0.6987   Mean   :0.69   Mean   :0.6575   Mean   :0.7276  
##  3rd Qu.:2.0000   3rd Qu.:2.00   3rd Qu.:1.0000   3rd Qu.:2.0000  
##  Max.   :2.0000   Max.   :2.00   Max.   :2.0000   Max.   :2.0000  
##       p6_6          p6_6esp                p7              p7_3       
##  Min.   :0.0000   Length:1968        Min.   :0.0000   Min.   :     0  
##  1st Qu.:0.0000   Class :character   1st Qu.:0.0000   1st Qu.:     0  
##  Median :0.0000   Mode  :character   Median :0.0000   Median :     0  
##  Mean   :0.8161                      Mean   :0.8755   Mean   :  3191  
##  3rd Qu.:2.0000                      3rd Qu.:2.0000   3rd Qu.:     0  
##  Max.   :2.0000                      Max.   :3.0000   Max.   :999999  
##       p8_1       
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.7185  
##  3rd Qu.:2.0000  
##  Max.   :2.0000

Se verifica que ya no hay datos perdidos en el data set de interés para 2024. (Se recomienda repetir este proceso para M23selec si es necesario).

Datos perdidos o faltantes 2023 Visualizo si las variables de interés en la base de 2023 tienen datos perdidos

summary(M23selec)
##     control         viv_sel         num_hog     hog_mud           n_ren_el     
##  Min.   :40001   Min.   :1.000   Min.   :1   Min.   :0.00000   Min.   : 1.000  
##  1st Qu.:40100   1st Qu.:2.000   1st Qu.:1   1st Qu.:0.00000   1st Qu.: 1.000  
##  Median :40199   Median :2.000   Median :1   Median :0.00000   Median : 1.000  
##  Mean   :40253   Mean   :2.486   Mean   :1   Mean   :0.05561   Mean   : 1.667  
##  3rd Qu.:40335   3rd Qu.:3.000   3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.: 2.000  
##  Max.   :41398   Max.   :4.000   Max.   :1   Max.   :3.00000   Max.   :10.000  
##                                                                                
##        cd           periodo         sexo            edad           nivel       
##  Min.   : 1.00   Min.   :223   Min.   :1.000   Min.   :18.00   Min.   : 0.000  
##  1st Qu.: 2.00   1st Qu.:223   1st Qu.:1.000   1st Qu.:32.00   1st Qu.: 3.000  
##  Median : 9.00   Median :223   Median :2.000   Median :45.00   Median : 4.000  
##  Mean   :13.85   Mean   :223   Mean   :1.585   Mean   :46.21   Mean   : 4.709  
##  3rd Qu.:25.00   3rd Qu.:223   3rd Qu.:2.000   3rd Qu.:59.00   3rd Qu.: 7.000  
##  Max.   :43.00   Max.   :223   Max.   :2.000   Max.   :97.00   Max.   :99.000  
##                                                                                
##       anio          cond_act            p1              p2       
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median : 1.000   Median :1.000   Median :1.000  
##  Mean   :3.521   Mean   : 3.269   Mean   :1.019   Mean   :1.419  
##  3rd Qu.:4.000   3rd Qu.: 7.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :7.000   Max.   :10.000   Max.   :2.000   Max.   :2.000  
##  NA's   :48                                                      
##       p3_1            p3_2            p3_3            p3_4            p3_5     
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.00  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.00  
##  Mean   :1.565   Mean   :1.742   Mean   :1.784   Mean   :1.909   Mean   :1.61  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.00  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.00  
##                                                                                
##        p4               p5         p5_6esp               p6_1       
##  Min.   : 0.000   Min.   :0.00   Length:1996        Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.00   Class :character   1st Qu.:0.0000  
##  Median : 0.000   Median :0.00   Mode  :character   Median :0.0000  
##  Mean   : 1.381   Mean   :1.34                      Mean   :0.7665  
##  3rd Qu.: 2.000   3rd Qu.:3.00                      3rd Qu.:2.0000  
##  Max.   :80.000   Max.   :6.00                      Max.   :2.0000  
##                                                                     
##       p6_2             p6_3             p6_4             p6_5       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.6784   Mean   :0.6759   Mean   :0.6473   Mean   :0.6979  
##  3rd Qu.:2.0000   3rd Qu.:2.0000   3rd Qu.:1.0000   3rd Qu.:2.0000  
##  Max.   :2.0000   Max.   :2.0000   Max.   :2.0000   Max.   :2.0000  
##                                                                     
##       p6_6          p6_6esp                p7              p7_3       
##  Min.   :0.0000   Length:1996        Min.   :0.0000   Min.   :     0  
##  1st Qu.:0.0000   Class :character   1st Qu.:0.0000   1st Qu.:     0  
##  Median :0.0000   Mode  :character   Median :0.0000   Median :     0  
##  Mean   :0.7956                      Mean   :0.8682   Mean   :  2137  
##  3rd Qu.:2.0000                      3rd Qu.:2.0000   3rd Qu.:     0  
##  Max.   :2.0000                      Max.   :3.0000   Max.   :999999  
##                                                                       
##       p8_1       
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.6904  
##  3rd Qu.:2.0000  
##  Max.   :2.0000  
## 

Para calcular el porcentaje de datos faltantes en la columna 10 de la selección (anio):

# Verifica la columna 10 (la décima variable seleccionada) en la base de 2023
sum(is.na(M23selec[ ,11])) / length(M23selec[ ,11])
## [1] 0.0240481

Aplicamos la función na.omit() para eliminar las filas con datos faltantes en la selección de 2023:

M23selec = na.omit(M23selec)
summary(M23selec)
##     control         viv_sel         num_hog     hog_mud           n_ren_el     
##  Min.   :40001   Min.   :1.000   Min.   :1   Min.   :0.00000   Min.   : 1.000  
##  1st Qu.:40101   1st Qu.:2.000   1st Qu.:1   1st Qu.:0.00000   1st Qu.: 1.000  
##  Median :40199   Median :2.000   Median :1   Median :0.00000   Median : 1.000  
##  Mean   :40250   Mean   :2.495   Mean   :1   Mean   :0.05339   Mean   : 1.669  
##  3rd Qu.:40333   3rd Qu.:3.000   3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.: 2.000  
##  Max.   :41398   Max.   :4.000   Max.   :1   Max.   :2.00000   Max.   :10.000  
##        cd           periodo         sexo            edad           nivel      
##  Min.   : 1.00   Min.   :223   Min.   :1.000   Min.   :18.00   Min.   :2.000  
##  1st Qu.: 2.00   1st Qu.:223   1st Qu.:1.000   1st Qu.:32.00   1st Qu.:3.000  
##  Median : 9.00   Median :223   Median :2.000   Median :45.00   Median :4.000  
##  Mean   :13.77   Mean   :223   Mean   :1.583   Mean   :45.82   Mean   :4.571  
##  3rd Qu.:25.00   3rd Qu.:223   3rd Qu.:2.000   3rd Qu.:58.00   3rd Qu.:7.000  
##  Max.   :43.00   Max.   :223   Max.   :2.000   Max.   :97.00   Max.   :9.000  
##       anio          cond_act            p1              p2       
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median : 1.000   Median :1.000   Median :1.000  
##  Mean   :3.521   Mean   : 3.244   Mean   :1.005   Mean   :1.437  
##  3rd Qu.:4.000   3rd Qu.: 7.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :7.000   Max.   :10.000   Max.   :2.000   Max.   :2.000  
##       p3_1            p3_2            p3_3            p3_4            p3_5     
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.00  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.00  
##  Mean   :1.585   Mean   :1.765   Mean   :1.808   Mean   :1.936   Mean   :1.63  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.00  
##  Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.00  
##        p4               p5          p5_6esp               p6_1       
##  Min.   : 0.000   Min.   :0.000   Length:1948        Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.000   Class :character   1st Qu.:0.0000  
##  Median : 0.000   Median :0.000   Mode  :character   Median :0.0000  
##  Mean   : 1.412   Mean   :1.363                      Mean   :0.7813  
##  3rd Qu.: 2.000   3rd Qu.:3.000                      3rd Qu.:2.0000  
##  Max.   :80.000   Max.   :6.000                      Max.   :2.0000  
##       p6_2            p6_3             p6_4             p6_5      
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.691   Mean   :0.6905   Mean   :0.6591   Mean   :0.711  
##  3rd Qu.:2.000   3rd Qu.:2.0000   3rd Qu.:2.0000   3rd Qu.:2.000  
##  Max.   :2.000   Max.   :2.0000   Max.   :2.0000   Max.   :2.000  
##       p6_6          p6_6esp                p7             p7_3       
##  Min.   :0.0000   Length:1948        Min.   :0.000   Min.   :     0  
##  1st Qu.:0.0000   Class :character   1st Qu.:0.000   1st Qu.:     0  
##  Median :0.0000   Mode  :character   Median :0.000   Median :     0  
##  Mean   :0.8111                      Mean   :0.885   Mean   :  2190  
##  3rd Qu.:2.0000                      3rd Qu.:2.000   3rd Qu.:     0  
##  Max.   :2.0000                      Max.   :3.000   Max.   :999999  
##       p8_1       
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.7033  
##  3rd Qu.:2.0000  
##  Max.   :2.0000