R para usar la ENOE

Ana Escoto
17/06/2019

Formato de presentacÃ³n
Previo
ImportacÃ³n
- Formato dbf
- Formato dta y sav
Aplicaciones
El filtro bÃ¡sico
- Modelo simple
- Modelo mÃºltiple
Ejercicio

$i0$ #La Encuesta

La Encuesta Nacional de OcupacÃ³n y Empleo (ENOE) es hoy dÃ?a la encuesta continua levantada en hogares mÃ¡s grande que se aplica en el paÃ?s. Su puesta en marcha en enero del 2005 marcÃ³ el fin de un modelo de captacÃ³n y procesamiento que tuvo vigencia durante 20 aÃ±os, el cual correspondÃ³ a la Encuesta Nacional de Empleo Urbano (ENEU) seguida por la Encuesta Nacional de Empleo (ENE) en donde aquÃÂ©lla quedÃ³ integrada. (INEGI, 2007)

Esta fuente es esencial al estudio de los mercados de trabajo en MÃÂ©xico. La muestre permite hablar de zonas urbanas y rurales y de un listado de ciudades auto-representadas.

Formato de presentacÃ³n

La base de datos era publicada en .dbf hasta un par de aÃ±os. Hoy en dÃ?a se presenta en cuatro formatos * .csv * .dta * .sav * .dbf

https://www.inegi.org.mx/programas/enoe/15ymas/default.html#Microdatos

Vamos a importar los archivos desde tres tipos de formato, utilizando el primer trimestre de 2019, publicado en la quincena de mayo de 2019. Podemos descargarlos desde acÃ¡: https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/2019trim1_csv.zip https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/2019trim1_dta.zip https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/2019trim1_sav.zip https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/2019trim1_dbf.zip

Si hay problemas de conectividad, tambiÃÂ©n estÃ¡ este link: https://www.dropbox.com/sh/2k9wujo8lmbv0qn/AAAJSIETI5WwR6GIWgzsAiN8a?dl=0

Independientemente del formato, la ENOE tiene cinco tablas

Para saber mejor quÃÂ© contiene cada tipo de tabla, podemos consultar la descripcÃ³n de las tablas en el siguiente link https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/doc/fd_c_bas_amp_15ymas.pdf

En este caso estamos trabajado con el cuestionario ampliado, que se realiza en el primer trimestre de cada aÃ±o. Hay tambiÃÂ©n un cuestionario bÃ¡sico que se aplica a los individuos en el resto de visitas. Esto sigue el siguiente calendario. Pero en realidad, por ser una primera revisÃ³n utilizaremos el cuestionario sociodemogrÃ¡fico, donde se incluyen variables calculadas por INEGI (si queremos saber cÃ³mo se construyeron podemos consultar la documentacÃ³n asÃ? https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/doc/recons_var_15ymas.pdf)

Previo

Siempre es Ãºtile que primero establezcamos un directorio de trabajo, donde serÃ¡ mÃ¡s fÃ¡cil importar nuestros datos.

setwd("C:/Users/Aula D1/Downloads/Taller2")

TambiÃÂ©n vamos a trabajar con estos paquetes, habrÃ¡ que instalarlos si no los tenemos instalados

paquetes<-c("foreign","tidyverse", "sjlabelled", "stargazer", "sjPlot", "survey", "questionr")

Y con este cÃ³digo instalamos:

nuevos.paquetes<- paquetes[!(paquetes %in% installed.packages()[,"Package"])]
if(length(nuevos.paquetes)) install.packages(nuevos.paquetes, repos = "https://cran.itam.mx/")

ImportacÃ³n

Formato dbf

Como seÃ±alamos en el seccÃ³n pasada, podemos importar a R, cualquier tipo de datos. Vamos a importarlo desde dbf. Este formato es mÃ¡s estable en el tiempo, si se quieren construir series. En mi experiencia, he notado que en .dta y .sav se pueden encontrar cambios en la codificacÃ³n.

library(foreign)
sdemt119<-read.dbf("sdemt119.dbf")

Vamos a revisar la base, con glipmse (parte de dplyr de tidyverse)

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

glimpse(sdemt119)

## Observations: 406,036
## Variables: 104
## $ R_DEF      <fct> 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00,...
## $ LOC        <fct> 0001, 0001, 0001, 0001, 0001, 0001, 0001, 0001, 000...
## $ MUN        <fct> 002, 002, 002, 002, 002, 002, 002, 002, 002, 002, 0...
## $ EST        <fct> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ EST_D      <int> 117, 117, 117, 117, 117, 117, 117, 117, 117, 117, 1...
## $ AGEB       <fct> 00000, 00000, 00000, 00000, 00000, 00000, 00000, 00...
## $ T_LOC      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ CD_A       <fct> 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01,...
## $ ENT        <fct> 09, 09, 09, 09, 09, 09, 09, 09, 09, 09, 09, 09, 09,...
## $ CON        <fct> 40001, 40001, 40001, 40001, 40001, 40001, 40001, 40...
## $ UPM        <fct> 0900471, 0900471, 0900471, 0900471, 0900471, 090047...
## $ D_SEM      <fct> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1...
## $ N_PRO_VIV  <fct> 0001, 0001, 0001, 0001, 0001, 0023, 0023, 0023, 002...
## $ V_SEL      <fct> 01, 01, 01, 01, 01, 02, 02, 02, 02, 03, 03, 04, 04,...
## $ N_HOG      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ H_MUD      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ N_ENT      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ PER        <fct> 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 1...
## $ N_REN      <fct> 01, 02, 03, 04, 05, 01, 02, 03, 04, 01, 02, 01, 02,...
## $ C_RES      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ PAR_C      <fct> 101, 201, 301, 301, 415, 101, 301, 301, 301, 101, 3...
## $ SEX        <fct> 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, ...
## $ EDA        <fct> 46, 48, 16, 12, 76, 43, 18, 16, 13, 70, 41, 68, 34,...
## $ NAC_DIA    <fct> 27, 15, 05, 12, 10, 11, 06, 08, 04, 14, 13, 24, 19,...
## $ NAC_MES    <fct> 12, 01, 12, 06, 12, 08, 12, 08, 01, 09, 01, 09, 05,...
## $ NAC_ANIO   <fct> 1972, 1970, 2002, 2006, 1942, 1975, 2000, 2002, 200...
## $ L_NAC_C    <fct> 009, 009, 009, 009, 300, 009, 015, 015, 015, 009, 0...
## $ CS_P12     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ CS_P13_1   <fct> 06, 07, 03, 02, 07, 07, 04, 03, 03, 07, 07, 04, 07,...
## $ CS_P13_2   <fct> 2, 4, 3, 6, 3, 4, 2, 3, 1, 4, 4, 3, 4, 6, 4, 3, 6, ...
## $ CS_P14_C   <fct> 2814, 5313, NA, NA, 5711, 5441, NA, NA, NA, 5531, 5...
## $ CS_P15     <fct> 2, 3, NA, NA, 3, 3, NA, NA, NA, 3, 3, NA, 3, NA, 3,...
## $ CS_P16     <fct> 1, 1, NA, NA, 2, 1, NA, NA, NA, 1, 1, NA, 1, NA, 1,...
## $ CS_P17     <fct> 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, ...
## $ N_HIJ      <fct> NA, 02, NA, 00, 05, 03, NA, 00, 00, NA, 00, 01, NA,...
## $ E_CON      <fct> 5, 5, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 1, 5, 6, ...
## $ CS_AD_MOT  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_P20_DES <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_AD_DES  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_NR_MOT  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_P22_DES <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_NR_ORI  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ UR         <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ ZONA       <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ SALARIO    <int> 3080, 3080, 3080, 3080, 3080, 3080, 3080, 3080, 308...
## $ FAC        <int> 475, 475, 475, 475, 475, 475, 475, 475, 475, 475, 4...
## $ CLASE1     <int> 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1, 2, ...
## $ CLASE2     <int> 1, 1, 4, 4, 4, 1, 4, 4, 4, 2, 1, 1, 1, 4, 1, 2, 4, ...
## $ CLASE3     <int> 1, 1, 0, 0, 0, 1, 0, 0, 0, 6, 1, 3, 1, 0, 1, 6, 0, ...
## $ POS_OCU    <int> 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, ...
## $ SEG_SOC    <int> 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 0, 0, ...
## $ RAMA       <int> 4, 3, 0, 0, 0, 4, 0, 0, 0, 0, 2, 4, 3, 0, 4, 0, 0, ...
## $ C_OCU11C   <int> 3, 3, 0, 0, 0, 4, 0, 0, 0, 0, 1, 4, 3, 0, 6, 0, 0, ...
## $ ING7C      <int> 7, 7, 0, 0, 0, 7, 0, 0, 0, 0, 5, 3, 7, 0, 1, 0, 0, ...
## $ DUR9C      <int> 8, 3, 0, 0, 0, 3, 0, 0, 0, 0, 4, 1, 7, 0, 8, 0, 0, ...
## $ EMPLE7C    <int> 2, 5, 0, 0, 0, 6, 0, 0, 0, 0, 5, 6, 7, 0, 7, 0, 0, ...
## $ MEDICA5C   <int> 1, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 3, 3, 0, 1, 0, 0, ...
## $ BUSCAR5C   <int> 4, 4, 0, 0, 0, 2, 0, 0, 0, 0, 4, 4, 4, 0, 4, 0, 0, ...
## $ RAMA_EST1  <int> 3, 3, 0, 0, 0, 3, 0, 0, 0, 0, 2, 3, 3, 0, 3, 0, 0, ...
## $ RAMA_EST2  <int> 6, 5, 0, 0, 0, 11, 0, 0, 0, 0, 3, 11, 5, 0, 7, 0, 0...
## $ DUR_EST    <int> 5, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 1, 5, 0, 5, 0, 0, ...
## $ AMBITO1    <int> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 3, 0, 3, 0, 0, ...
## $ AMBITO2    <int> 3, 5, 0, 0, 0, 7, 0, 0, 0, 0, 4, 7, 0, 0, 0, 0, 0, ...
## $ TUE1       <int> 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 2, 1, 0, 1, 0, 0, ...
## $ TUE2       <int> 2, 1, 0, 0, 0, 4, 0, 0, 0, 0, 1, 4, 1, 0, 1, 0, 0, ...
## $ TUE3       <int> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...
## $ BUSQUEDA   <int> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 2, 0, 0, ...
## $ D_ANT_LAB  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ...
## $ D_CEXP_EST <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, ...
## $ DUR_DES    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, ...
## $ SUB_O      <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ S_CLASIFI  <int> 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ REMUNE2C   <int> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, ...
## $ PRE_ASA    <int> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 0, 0, ...
## $ TIP_CON    <int> 0, 3, 0, 0, 0, 2, 0, 0, 0, 0, 2, 3, 2, 0, 5, 0, 0, ...
## $ DISPO      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ NODISPO    <int> 0, 0, 3, 3, 3, 0, 9, 9, 3, 0, 0, 0, 0, 3, 0, 0, 3, ...
## $ C_INAC5C   <int> 0, 0, 1, 1, 2, 0, 1, 1, 1, 0, 0, 0, 0, 2, 0, 0, 1, ...
## $ PNEA_EST   <int> 0, 0, 4, 4, 4, 0, 3, 3, 4, 0, 0, 0, 0, 4, 0, 0, 4, ...
## $ NIV_INS    <int> 3, 4, 3, 2, 4, 4, 3, 3, 2, 4, 4, 4, 4, 2, 4, 4, 2, ...
## $ EDA5C      <int> 3, 3, 1, 0, 4, 2, 1, 1, 0, 4, 2, 4, 2, 3, 3, 2, 0, ...
## $ EDA7C      <int> 4, 4, 1, 0, 6, 4, 1, 1, 0, 6, 4, 6, 3, 6, 6, 4, 0, ...
## $ EDA12C     <int> 7, 7, 1, 0, 11, 6, 1, 1, 0, 11, 6, 11, 4, 10, 10, 6...
## $ EDA19C     <int> 12, 12, 6, 5, 18, 11, 6, 6, 5, 17, 11, 16, 9, 15, 1...
## $ HIJ5C      <int> 0, 2, 0, 1, 3, 3, 0, 1, 1, 0, 1, 2, 0, 3, 0, 2, 0, ...
## $ DOMESTICO  <int> 4, 3, 8, 8, 8, 3, 8, 8, 8, 3, 3, 3, 2, 8, 1, 3, 7, ...
## $ ANIOS_ESC  <int> 11, 16, 9, 6, 15, 16, 11, 9, 7, 16, 16, 12, 16, 6, ...
## $ HRSOCUP    <int> 65, 24, 0, 0, 0, 24, 0, 0, 0, 0, 32, 0, 50, 0, 60, ...
## $ INGOCUP    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20000, 7000, 0, 0, 0,...
## $ ING_X_HRS  <dbl> 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0...
## $ TPG_P8A    <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ TCCO       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, ...
## $ CP_ANOC    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IMSSISSSTE <int> 4, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 2, 1, 0, 4, 0, 0, ...
## $ MA48ME1SM  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ P14APOYOS  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ SCIAN      <int> 18, 6, 0, 0, 0, 20, 0, 0, 0, 0, 5, 20, 6, 0, 9, 0, ...
## $ T_TRA      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ EMP_PPAL   <int> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 1, 0, 0, ...
## $ TUE_PPAL   <int> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 2, 0, 0, ...
## $ TRANS_PPAL <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ MH_FIL2    <int> 3, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 3, 3, 0, 3, 0, 0, ...
## $ MH_COL     <int> 6, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 1, 0, 0, ...
## $ SEC_INS    <int> 4, 2, 0, 0, 0, 5, 0, 0, 0, 0, 2, 5, 2, 0, 2, 0, 0, ...

class(sdemt119) # tipo de objeto

## [1] "data.frame"

names(sdemt119) # lista las variables

##   [1] "R_DEF"      "LOC"        "MUN"        "EST"        "EST_D"     
##   [6] "AGEB"       "T_LOC"      "CD_A"       "ENT"        "CON"       
##  [11] "UPM"        "D_SEM"      "N_PRO_VIV"  "V_SEL"      "N_HOG"     
##  [16] "H_MUD"      "N_ENT"      "PER"        "N_REN"      "C_RES"     
##  [21] "PAR_C"      "SEX"        "EDA"        "NAC_DIA"    "NAC_MES"   
##  [26] "NAC_ANIO"   "L_NAC_C"    "CS_P12"     "CS_P13_1"   "CS_P13_2"  
##  [31] "CS_P14_C"   "CS_P15"     "CS_P16"     "CS_P17"     "N_HIJ"     
##  [36] "E_CON"      "CS_AD_MOT"  "CS_P20_DES" "CS_AD_DES"  "CS_NR_MOT" 
##  [41] "CS_P22_DES" "CS_NR_ORI"  "UR"         "ZONA"       "SALARIO"   
##  [46] "FAC"        "CLASE1"     "CLASE2"     "CLASE3"     "POS_OCU"   
##  [51] "SEG_SOC"    "RAMA"       "C_OCU11C"   "ING7C"      "DUR9C"     
##  [56] "EMPLE7C"    "MEDICA5C"   "BUSCAR5C"   "RAMA_EST1"  "RAMA_EST2" 
##  [61] "DUR_EST"    "AMBITO1"    "AMBITO2"    "TUE1"       "TUE2"      
##  [66] "TUE3"       "BUSQUEDA"   "D_ANT_LAB"  "D_CEXP_EST" "DUR_DES"   
##  [71] "SUB_O"      "S_CLASIFI"  "REMUNE2C"   "PRE_ASA"    "TIP_CON"   
##  [76] "DISPO"      "NODISPO"    "C_INAC5C"   "PNEA_EST"   "NIV_INS"   
##  [81] "EDA5C"      "EDA7C"      "EDA12C"     "EDA19C"     "HIJ5C"     
##  [86] "DOMESTICO"  "ANIOS_ESC"  "HRSOCUP"    "INGOCUP"    "ING_X_HRS" 
##  [91] "TPG_P8A"    "TCCO"       "CP_ANOC"    "IMSSISSSTE" "MA48ME1SM" 
##  [96] "P14APOYOS"  "SCIAN"      "T_TRA"      "EMP_PPAL"   "TUE_PPAL"  
## [101] "TRANS_PPAL" "MH_FIL2"    "MH_COL"     "SEC_INS"

head(sdemt119) # muestra las primeras 6 lÃ?neas

##   R_DEF  LOC MUN EST EST_D  AGEB T_LOC CD_A ENT   CON     UPM D_SEM
## 1    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
## 2    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
## 3    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
## 4    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
## 5    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
## 6    00 0001 002  10   117 00000     1   01  09 40001 0900471   101
##   N_PRO_VIV V_SEL N_HOG H_MUD N_ENT PER N_REN C_RES PAR_C SEX EDA NAC_DIA
## 1      0001    01     1     0     1 119    01     1   101   1  46      27
## 2      0001    01     1     0     1 119    02     1   201   2  48      15
## 3      0001    01     1     0     1 119    03     1   301   1  16      05
## 4      0001    01     1     0     1 119    04     1   301   2  12      12
## 5      0001    01     1     0     1 119    05     1   415   2  76      10
## 6      0023    02     1     0     1 119    01     1   101   2  43      11
##   NAC_MES NAC_ANIO L_NAC_C CS_P12 CS_P13_1 CS_P13_2 CS_P14_C CS_P15 CS_P16
## 1      12     1972     009      1       06        2     2814      2      1
## 2      01     1970     009      1       07        4     5313      3      1
## 3      12     2002     009      1       03        3     <NA>   <NA>   <NA>
## 4      06     2006     009      1       02        6     <NA>   <NA>   <NA>
## 5      12     1942     300      1       07        3     5711      3      2
## 6      08     1975     009      1       07        4     5441      3      1
##   CS_P17 N_HIJ E_CON CS_AD_MOT CS_P20_DES CS_AD_DES CS_NR_MOT CS_P22_DES
## 1      2  <NA>     5      <NA>       <NA>      <NA>      <NA>       <NA>
## 2      2    02     5      <NA>       <NA>      <NA>      <NA>       <NA>
## 3      1  <NA>     6      <NA>       <NA>      <NA>      <NA>       <NA>
## 4      1    00     6      <NA>       <NA>      <NA>      <NA>       <NA>
## 5      2    05     4      <NA>       <NA>      <NA>      <NA>       <NA>
## 6      2    03     6      <NA>       <NA>      <NA>      <NA>       <NA>
##   CS_NR_ORI UR ZONA SALARIO FAC CLASE1 CLASE2 CLASE3 POS_OCU SEG_SOC RAMA
## 1      <NA>  1    2    3080 475      1      1      1       2       2    4
## 2      <NA>  1    2    3080 475      1      1      1       1       1    3
## 3      <NA>  1    2    3080 475      2      4      0       0       0    0
## 4      <NA>  1    2    3080 475      2      4      0       0       0    0
## 5      <NA>  1    2    3080 475      2      4      0       0       0    0
## 6      <NA>  1    2    3080 475      1      1      1       1       1    4
##   C_OCU11C ING7C DUR9C EMPLE7C MEDICA5C BUSCAR5C RAMA_EST1 RAMA_EST2
## 1        3     7     8       2        1        4         3         6
## 2        3     7     3       5        3        4         3         5
## 3        0     0     0       0        0        0         0         0
## 4        0     0     0       0        0        0         0         0
## 5        0     0     0       0        0        0         0         0
## 6        4     7     3       6        3        2         3        11
##   DUR_EST AMBITO1 AMBITO2 TUE1 TUE2 TUE3 BUSQUEDA D_ANT_LAB D_CEXP_EST
## 1       5       2       3    1    2    0        2         0          0
## 2       3       2       5    1    1    0        2         0          0
## 3       0       0       0    0    0    0        0         0          0
## 4       0       0       0    0    0    0        0         0          0
## 5       0       0       0    0    0    0        0         0          0
## 6       3       2       7    2    4    2        2         0          0
##   DUR_DES SUB_O S_CLASIFI REMUNE2C PRE_ASA TIP_CON DISPO NODISPO C_INAC5C
## 1       0     0         0        0       0       0     0       0        0
## 2       0     0         0        1       1       3     0       0        0
## 3       0     0         0        0       0       0     0       3        1
## 4       0     0         0        0       0       0     0       3        1
## 5       0     0         0        0       0       0     0       3        2
## 6       0     1         5        1       1       2     0       0        0
##   PNEA_EST NIV_INS EDA5C EDA7C EDA12C EDA19C HIJ5C DOMESTICO ANIOS_ESC
## 1        0       3     3     4      7     12     0         4        11
## 2        0       4     3     4      7     12     2         3        16
## 3        4       3     1     1      1      6     0         8         9
## 4        4       2     0     0      0      5     1         8         6
## 5        4       4     4     6     11     18     3         8        15
## 6        0       4     2     4      6     11     3         3        16
##   HRSOCUP INGOCUP ING_X_HRS TPG_P8A TCCO CP_ANOC IMSSISSSTE MA48ME1SM
## 1      65       0         0       0    0       0          4         0
## 2      24       0         0       0    0       0          1         0
## 3       0       0         0       0    0       0          0         0
## 4       0       0         0       0    0       0          0         0
## 5       0       0         0       0    0       0          0         0
## 6      24       0         0       1    0       0          2         0
##   P14APOYOS SCIAN T_TRA EMP_PPAL TUE_PPAL TRANS_PPAL MH_FIL2 MH_COL
## 1         0    18     1        2        2          0       3      6
## 2         0     6     1        2        2          0       3      2
## 3         0     0     1        0        0          0       0      0
## 4         0     0     1        0        0          0       0      0
## 5         0     0     1        0        0          0       0      0
## 6         0    20     1        2        2          0       3      2
##   SEC_INS
## 1       4
## 2       2
## 3       0
## 4       0
## 5       0
## 6       5

table(sdemt119$CLASE2) # un tabulado simple

## 
##      0      1      2      3      4 
##  84634 176997   6326  18681 119398

Formato dta y sav

La ventaja de trabajar con estos formatos, es que INEGI ha etiquetado las variables. INEGI trabaja con versiones bastante nuevas de STATA y de SPSS por lo que el paquete foreign no se puede utilizar.

Vamos a importarlo con el paquete "haven", parte de "tidyverse"

library(haven)
sdemt119<-read_spss("sdemt119.sav")
sdemt119<-read_dta("sdemt119.dta")

Nuevamente podemos revisarla y veremos algunas diferencias:

glimpse(sdemt119) # revisar la estructura

## Observations: 406,036
## Variables: 104
## $ R_DEF      <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ LOC        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ MUN        <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ EST        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ EST_D      <dbl> 117, 117, 117, 117, 117, 117, 117, 117, 117, 117, 1...
## $ AGEB       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ T_LOC      <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ CD_A       <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ ENT        <dbl+lbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,...
## $ CON        <dbl> 40001, 40001, 40001, 40001, 40001, 40001, 40001, 40...
## $ UPM        <dbl> 900471, 900471, 900471, 900471, 900471, 900471, 900...
## $ D_SEM      <dbl+lbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 10...
## $ N_PRO_VIV  <dbl> 1, 1, 1, 1, 1, 23, 23, 23, 23, 53, 53, 79, 79, 106,...
## $ V_SEL      <dbl+lbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5,...
## $ N_HOG      <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ H_MUD      <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ N_ENT      <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ PER        <dbl+lbl> 119, 119, 119, 119, 119, 119, 119, 119, 119, 11...
## $ N_REN      <dbl+lbl> 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 1, 2, 1, 2, 3,...
## $ C_RES      <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ PAR_C      <dbl> 101, 201, 301, 301, 415, 101, 301, 301, 301, 101, 3...
## $ SEX        <dbl+lbl> 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 1, 2,...
## $ EDA        <dbl> 46, 48, 16, 12, 76, 43, 18, 16, 13, 70, 41, 68, 34,...
## $ NAC_DIA    <dbl+lbl> 27, 15, 5, 12, 10, 11, 6, 8, 4, 14, 13, 24, 19,...
## $ NAC_MES    <dbl+lbl> 12, 1, 12, 6, 12, 8, 12, 8, 1, 9, 1, 9, 5, 9, 1...
## $ NAC_ANIO   <dbl> 1972, 1970, 2002, 2006, 1942, 1975, 2000, 2002, 200...
## $ L_NAC_C    <dbl+lbl> 9, 9, 9, 9, 300, 9, 15, 15, 15, 9, 9, 9, 9, 9, ...
## $ CS_P12     <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ CS_P13_1   <dbl+lbl> 6, 7, 3, 2, 7, 7, 4, 3, 3, 7, 7, 4, 7, 2, 7, 4,...
## $ CS_P13_2   <dbl+lbl> 2, 4, 3, 6, 3, 4, 2, 3, 1, 4, 4, 3, 4, 6, 4, 3,...
## $ CS_P14_C   <chr> "2814", "5313", "", "", "5711", "5441", "", "", "",...
## $ CS_P15     <dbl+lbl> 2, 3, NA, NA, 3, 3, NA, NA, NA, 3, 3, NA, 3, NA...
## $ CS_P16     <dbl+lbl> 1, 1, NA, NA, 2, 1, NA, NA, NA, 1, 1, NA, 1, NA...
## $ CS_P17     <dbl+lbl> 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,...
## $ N_HIJ      <dbl+lbl> NA, 2, NA, 0, 5, 3, NA, 0, 0, NA, 0, 1, NA, 4, ...
## $ E_CON      <dbl+lbl> 5, 5, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 1, 5,...
## $ CS_AD_MOT  <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_P20_DES <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",...
## $ CS_AD_DES  <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_NR_MOT  <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ CS_P22_DES <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",...
## $ CS_NR_ORI  <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ UR         <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ ZONA       <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,...
## $ SALARIO    <dbl> 3080, 3080, 3080, 3080, 3080, 3080, 3080, 3080, 308...
## $ FAC        <dbl> 475, 475, 475, 475, 475, 475, 475, 475, 475, 475, 4...
## $ CLASE1     <dbl+lbl> 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1,...
## $ CLASE2     <dbl+lbl> 1, 1, 4, 4, 4, 1, 4, 4, 4, 2, 1, 1, 1, 4, 1, 2,...
## $ CLASE3     <dbl+lbl> 1, 1, 0, 0, 0, 1, 0, 0, 0, 6, 1, 3, 1, 0, 1, 6,...
## $ POS_OCU    <dbl+lbl> 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ SEG_SOC    <dbl+lbl> 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 0,...
## $ RAMA       <dbl+lbl> 4, 3, 0, 0, 0, 4, 0, 0, 0, 0, 2, 4, 3, 0, 4, 0,...
## $ C_OCU11C   <dbl+lbl> 3, 3, 0, 0, 0, 4, 0, 0, 0, 0, 1, 4, 3, 0, 6, 0,...
## $ ING7C      <dbl+lbl> 7, 7, 0, 0, 0, 7, 0, 0, 0, 0, 5, 3, 7, 0, 1, 0,...
## $ DUR9C      <dbl+lbl> 8, 3, 0, 0, 0, 3, 0, 0, 0, 0, 4, 1, 7, 0, 8, 0,...
## $ EMPLE7C    <dbl+lbl> 2, 5, 0, 0, 0, 6, 0, 0, 0, 0, 5, 6, 7, 0, 7, 0,...
## $ MEDICA5C   <dbl+lbl> 1, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 3, 3, 0, 1, 0,...
## $ BUSCAR5C   <dbl+lbl> 4, 4, 0, 0, 0, 2, 0, 0, 0, 0, 4, 4, 4, 0, 4, 0,...
## $ RAMA_EST1  <dbl+lbl> 3, 3, 0, 0, 0, 3, 0, 0, 0, 0, 2, 3, 3, 0, 3, 0,...
## $ RAMA_EST2  <dbl+lbl> 6, 5, 0, 0, 0, 11, 0, 0, 0, 0, 3, 11, 5, 0, 7, ...
## $ DUR_EST    <dbl+lbl> 5, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 1, 5, 0, 5, 0,...
## $ AMBITO1    <dbl+lbl> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 3, 0, 3, 0,...
## $ AMBITO2    <dbl+lbl> 3, 5, 0, 0, 0, 7, 0, 0, 0, 0, 4, 7, 0, 0, 0, 0,...
## $ TUE1       <dbl+lbl> 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 2, 1, 0, 1, 0,...
## $ TUE2       <dbl+lbl> 2, 1, 0, 0, 0, 4, 0, 0, 0, 0, 1, 4, 1, 0, 1, 0,...
## $ TUE3       <dbl+lbl> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ BUSQUEDA   <dbl+lbl> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 2, 0,...
## $ D_ANT_LAB  <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ D_CEXP_EST <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3,...
## $ DUR_DES    <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1,...
## $ SUB_O      <dbl+lbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ S_CLASIFI  <dbl+lbl> 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ REMUNE2C   <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ PRE_ASA    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 0,...
## $ TIP_CON    <dbl+lbl> 0, 3, 0, 0, 0, 2, 0, 0, 0, 0, 2, 3, 2, 0, 5, 0,...
## $ DISPO      <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NODISPO    <dbl+lbl> 0, 0, 3, 3, 3, 0, 9, 9, 3, 0, 0, 0, 0, 3, 0, 0,...
## $ C_INAC5C   <dbl+lbl> 0, 0, 1, 1, 2, 0, 1, 1, 1, 0, 0, 0, 0, 2, 0, 0,...
## $ PNEA_EST   <dbl+lbl> 0, 0, 4, 4, 4, 0, 3, 3, 4, 0, 0, 0, 0, 4, 0, 0,...
## $ NIV_INS    <dbl+lbl> 3, 4, 3, 2, 4, 4, 3, 3, 2, 4, 4, 4, 4, 2, 4, 4,...
## $ EDA5C      <dbl+lbl> 3, 3, 1, 0, 4, 2, 1, 1, 0, 4, 2, 4, 2, 3, 3, 2,...
## $ EDA7C      <dbl+lbl> 4, 4, 1, 0, 6, 4, 1, 1, 0, 6, 4, 6, 3, 6, 6, 4,...
## $ EDA12C     <dbl+lbl> 7, 7, 1, 0, 11, 6, 1, 1, 0, 11, 6, 11, 4, 10, 1...
## $ EDA19C     <dbl+lbl> 12, 12, 6, 5, 18, 11, 6, 6, 5, 17, 11, 16, 9, 1...
## $ HIJ5C      <dbl+lbl> 0, 2, 0, 1, 3, 3, 0, 1, 1, 0, 1, 2, 0, 3, 0, 2,...
## $ DOMESTICO  <dbl+lbl> 4, 3, 8, 8, 8, 3, 8, 8, 8, 3, 3, 3, 2, 8, 1, 3,...
## $ ANIOS_ESC  <dbl> 11, 16, 9, 6, 15, 16, 11, 9, 7, 16, 16, 12, 16, 6, ...
## $ HRSOCUP    <dbl> 65, 24, 0, 0, 0, 24, 0, 0, 0, 0, 32, 0, 50, 0, 60, ...
## $ INGOCUP    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20000, 7000, 0, 0, 0,...
## $ ING_X_HRS  <dbl> 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0...
## $ TPG_P8A    <dbl+lbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ TCCO       <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,...
## $ CP_ANOC    <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ IMSSISSSTE <dbl+lbl> 4, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 2, 1, 0, 4, 0,...
## $ MA48ME1SM  <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ P14APOYOS  <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ SCIAN      <dbl+lbl> 18, 6, 0, 0, 0, 20, 0, 0, 0, 0, 5, 20, 6, 0, 9,...
## $ T_TRA      <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ EMP_PPAL   <dbl+lbl> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 1, 0,...
## $ TUE_PPAL   <dbl+lbl> 2, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 2, 0,...
## $ TRANS_PPAL <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ MH_FIL2    <dbl+lbl> 3, 3, 0, 0, 0, 3, 0, 0, 0, 0, 3, 3, 3, 0, 3, 0,...
## $ MH_COL     <dbl+lbl> 6, 2, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 2, 0, 1, 0,...
## $ SEC_INS    <dbl+lbl> 4, 2, 0, 0, 0, 5, 0, 0, 0, 0, 2, 5, 2, 0, 2, 0,...

class(sdemt119) #clase

## [1] "tbl_df"     "tbl"        "data.frame"

table(sdemt119$CLASE2)

## 
##      0      1      2      3      4 
##  84634 176997   6326  18681 119398

Vemos que ademÃ¡s de un data.frame hay mÃ¡s informacÃ³n. Este tipo de objetos nos permiten hacer tablas tipo tibble en el tidyverse.

TambiÃÂ©n tenemos informacÃ³n sobre las etiquetas. El paquete "sjlabelled" nos ayuda mucho a utilizarlas

library(sjlabelled)

## 
## Attaching package: 'sjlabelled'

## The following objects are masked from 'package:haven':
## 
##     as_factor, read_sas, read_spss, read_stata, write_sas,
##     zap_labels

## The following object is masked from 'package:dplyr':
## 
##     as_label

table(as_label(sdemt119$CLASE2))

## 
##            No aplica    PoblaciÃ³n ocupada PoblaciÃ³n desocupada 
##                84634               176997                 6326 
##          Disponibles       No disponibles 
##                18681               119398

Incluso podrÃ?amos hacer un nuevo objeto

sdemt119_label<-as_label(sdemt119)

Y a este objeto podemos pedirle las tables y otras fuciones

class(sdemt119_label)

## [1] "tbl_df"     "tbl"        "data.frame"

table(sdemt119_label$CLASE2)

## 
##            No aplica    PoblaciÃ³n ocupada PoblaciÃ³n desocupada 
##                84634               176997                 6326 
##          Disponibles       No disponibles 
##                18681               119398

Ojo, las variables cambiarÃ?an de propiedades

class(sdemt119$CLASE2)

## [1] "haven_labelled"

class(sdemt119_label$CLASE2)

## [1] "factor"

summary(sdemt119$CLASE2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   1.000   1.781   4.000   4.000

prop.table(table(sdemt119$CLASE2))

## 
##          0          1          2          3          4 
## 0.20843965 0.43591455 0.01557990 0.04600824 0.29405767

summary(sdemt119_label$CLASE2)

##            No aplica    PoblaciÃ³n ocupada PoblaciÃ³n desocupada 
##                84634               176997                 6326 
##          Disponibles       No disponibles 
##                18681               119398

prop.table(table(sdemt119_label$CLASE2))

## 
##            No aplica    PoblaciÃ³n ocupada PoblaciÃ³n desocupada 
##           0.20843965           0.43591455           0.01557990 
##          Disponibles       No disponibles 
##           0.04600824           0.29405767

Aplicaciones

En esta secciones haremos un par de cÃ¡lculos para el mercado de trabajo mexicano.

Ingresos

Existen variables calculadas por INEGI para los ingresos que han sido aÃ±adidas al cuestionario sociodemogrÃ¡fico.

Medidas numÃÂ©ricas

Vamos a trabajar con los ingresos por hora "ING_X_HRS", ingresos laborales por hora.

summary(sdemt119$ING_X_HRS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   12.24   17.23 3750.00

No obstante esto es engaÃ±oso, porque hay menores de edad y personas que no trabajan. Podemos incluir un filtro. Los filtros en cÃ³digo de "base" de R se ponen en corchetes. Establecemos que sea la poblacÃ³n ocupada, identificada en la variable "CLASE2".

summary(sdemt119[sdemt119$CLASE2==1,]$ING_X_HRS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00   21.43   28.07   36.73 3750.00

En R tambiÃÂ©n usamos "pipes" para hacer filtros

sdemt119 %>% 
    filter(CLASE2 == 1)%>%
    summarise(avg_ing= mean(ING_X_HRS))

## # A tibble: 1 x 1
##   avg_ing
##     <dbl>
## 1    28.1

Este formato es util cuando queremos comparar por grupos

sdemt119 %>% 
    filter(CLASE2 == 1)%>%
      group_by(as_label(SEX)) %>%
      summarise(avg_ing = mean(ING_X_HRS))

## # A tibble: 2 x 2
##   `as_label(SEX)` avg_ing
##   <fct>             <dbl>
## 1 Hombre             28.5
## 2 Mujer              27.4

sdemt119 %>% 
    filter(CLASE2 == 1)%>%
      group_by(as_label(SEX)) %>%
      summarise(avg_ing_hr = mean(ING_X_HRS),
                avg_ing_total = mean(INGOCUP),
                avg_horas = mean(HRSOCUP))

## # A tibble: 2 x 4
##   `as_label(SEX)` avg_ing_hr avg_ing_total avg_horas
##   <fct>                <dbl>         <dbl>     <dbl>
## 1 Hombre                28.5         5235.      44.1
## 2 Mujer                 27.4         4005.      36.3

GrÃ¡ficos

Histograma

library(ggplot2)
# Histograma
ggplot(sdemt119[sdemt119$CLASE2==1,], aes(x=ING_X_HRS)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

GrÃ¡fico de densidad

# Densidad
ggplot(sdemt119[sdemt119$CLASE2==1,], aes(x=ING_X_HRS)) + geom_density()

Comparando grupos

ggplot(sdemt119[sdemt119$CLASE2==1,], 
       aes(x=ING_X_HRS, 
           fill=as_label(SEX), 
           color=as_label(SEX),
           alpha=I(0.5))
       ) + geom_density()

Transformaciones

ggplot(sdemt119[sdemt119$CLASE2==1,], 
       aes(x=log(ING_X_HRS+1), 
           fill=as_label(SEX), 
           color=as_label(SEX),
           alpha=I(0.5))
       ) + geom_density()

Si tomamos Ãºnicamente a quiÃÂ©nes reciben ingresos

ggplot(sdemt119[sdemt119$CLASE2==1 & sdemt119$ING_X_HRS>0,], 
       aes(x=log(ING_X_HRS), 
           fill=as_label(SEX), 
           color=as_label(SEX),
           alpha=I(0.5))
       ) + geom_density()

Ocupaciones

Si los ingresos reflejaban una parte de las condiciones laborales, tambiÃÂ©n es importante ver las tareas que realizan los trabajadores

table(sdemt119[sdemt119$CLASE2==1,]$C_OCU11C)

## 
##     1     2     3     4     5     6     7     8     9    10    11 
## 19308  6927  3278 16541 48244 31327  8878 26448  1579 14351   116

table(as_label(sdemt119[sdemt119$CLASE2==1,]$C_OCU11C))

## 
##                                       No aplica 
##                                               0 
## Profesionales, tÃ©cnicos y trabajadores del arte 
##                                           19308 
##                    Trabajadores de la educaciÃ³n 
##                                            6927 
##                       Funcionarios y directivos 
##                                            3278 
##                                     Oficinistas 
##                                           16541 
## Trabajadores industriales artesanos y ayudantes 
##                                           48244 
##                                    Comerciantes 
##                                           31327 
##                        Operadores de transporte 
##                                            8878 
##            Trabajadores en servicios personales 
##                                           26448 
##         Trabajadores en protecciÃ³n y vigilancia 
##                                            1579 
##                      Trabajadores agropecuarios 
##                                           14351 
##                                 No especificado 
##                                             116

Si guardamos esta tabla, la podemos graficar

freq.ocu<-table(as_label(sdemt119[sdemt119$CLASE2==1,]$C_OCU11C))
barplot(freq.ocu)

Para usar ggplot, necesitamos un data frame

gg.freq.ocu<-as.data.frame(freq.ocu)

Y con esto sÃ? podemos usar ggplot

g<-ggplot(data=gg.freq.ocu, aes(x=Var1, y=Freq)) +
  geom_bar(stat="identity")
g

g + theme(axis.text.x = element_text(angle = 90, hjust = 1))

Usando los factores de expansÃ³n

Cuando tenemos una encuesta, podemos dar cuenta de la poblacÃ³n. La primera forma de hacerlo es utilizando el factor de expansÃ³n. El factor de expansÃ³n nos da cuenta del peso que tiene cada observacÃ³n de acuerdo al diseÃ±o (estrato socioeconÃ³micio y ubicacÃ³ geogrÃ¡fica, en este caso.)

El factor de expansÃ³n es un nÃºmero. Esta persona de la muestra estarÃ?a representando a un nÃºmero especÃ?fico de la poblacÃ³n. Por lo que bÃ¡sicamente se trata de una multiplicacÃ³n.

library(questionr)

wtd.table(sdemt119$SEX,weights=sdemt119$FAC)

##        1        2 
## 60567328 64952571

fw_sex<-wtd.table(sdemt119$SEX,weights=sdemt119$FAC)
addmargins(fw_sex)

##         1         2       Sum 
##  60567328  64952571 125519899

Con filtros: Ojo, el factor debe tener el mismo filtro

wtd.table(sdemt119[sdemt119$CLASE2==1,]$C_OCU11C,weights=sdemt119[sdemt119$CLASE2==1,]$FAC)

##        1        2        3        4        5        6        7        8 
##  5431837  1917223   966289  4456947 14251202  9795530  2759197  7816823 
##        9       10       11 
##   500035  6594666    33201

fw_ocu<-wtd.table(sdemt119[sdemt119$CLASE2==1,]$C_OCU11C,weights=sdemt119[sdemt119$CLASE2==1,]$FAC)
addmargins(fw_ocu)

##        1        2        3        4        5        6        7        8 
##  5431837  1917223   966289  4456947 14251202  9795530  2759197  7816823 
##        9       10       11      Sum 
##   500035  6594666    33201 54522950

prop.table(fw_ocu)

##            1            2            3            4            5 
## 0.0996247819 0.0351635962 0.0177226104 0.0817444214 0.2613798776 
##            6            7            8            9           10 
## 0.1796588409 0.0506061576 0.1433675727 0.0091710922 0.1209521128 
##           11 
## 0.0006089362

La librerÃ?a no es compatible con sjlabelled.

Hay otra librerÃ?a: survey, sirve para diseÃ±os mÃ¡s complejos. Podemos incluir informacÃ³n de las etapas, de la unidad primaria de muestreo, etc.

library(survey)

## Loading required package: grid

## Loading required package: Matrix

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

ds_enoe<-svydesign(id=~UPM, strata=~EST_D, weight=~FAC, data=sdemt119[sdemt119$CLASE2==1,], nest=TRUE)
options(survey.lonely.psu="adjust")
svytotal(~factor(C_OCU11C), ds_enoe)

##                       total       SE
## factor(C_OCU11C)1   5431837  72274.3
## factor(C_OCU11C)2   1917223  43113.7
## factor(C_OCU11C)3    966289  36565.7
## factor(C_OCU11C)4   4456947  63167.8
## factor(C_OCU11C)5  14251202 145894.4
## factor(C_OCU11C)6   9795530 112520.6
## factor(C_OCU11C)7   2759197  51422.4
## factor(C_OCU11C)8   7816823  94257.8
## factor(C_OCU11C)9    500035  22612.4
## factor(C_OCU11C)10  6594666 129687.7
## factor(C_OCU11C)11    33201   5217.1

RecodificacÃ³n

Tanto la edad como la escolaridad tiene valores no numÃÂ©ricos. Tendremos que recodificarlos, para que los valores vÃ¡lidos no se incluyan en nuestros calculos, para dejar de estar haciendo filtros

sdemt119$EDA[sdemt119$EDA==99]<-NA
sdemt119$EDA[sdemt119$EDA==98]<-NA

sdemt119$ANIOS_ESC[sdemt119$ANIOS_ESC==99]<-NA

RegresÃ³n lineal

Presentamos un modelo muy sencillo que explique los ingresos laborales. Vamos a utilizar sexo, edad y educacÃ³n. ÃâÃÂ¿SeguirÃ¡ siendo la brecha de los ingresos tan pequeÃ±a?

El filtro bÃ¡sico

Unos filtros mÃ¡s complejos. Si vamos a comparar modelos N debe ser igual

sdemt119$filtro<- 0
sdemt119$filtro[sdemt119$CLASE2 == 1 & sdemt119$EDA>=15 & sdemt119$EDA<=98 & sdemt119$R_DEF==0 & (sdemt119$C_RES==1 | sdemt119$C_RES==3)] <- 1
sdemt119$filtro <-as.numeric(sdemt119$filtro)

mydata<-sdemt119[which(sdemt119$filtro==1), c("ING_X_HRS", "SEX", "ANIOS_ESC", "EDA")]
base_modelos <- na.omit(mydata)

Modelo simple

lm.0<-lm(ING_X_HRS ~ ANIOS_ESC, data=base_modelos)
summary(lm.0)

## 
## Call:
## lm(formula = ING_X_HRS ~ ANIOS_ESC, data = base_modelos)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -46.2  -26.2   -5.2    9.7 3713.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.18612    0.28933   49.03   <2e-16 ***
## ANIOS_ESC    1.33373    0.02548   52.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.17 on 175487 degrees of freedom
## Multiple R-squared:  0.01537,    Adjusted R-squared:  0.01537 
## F-statistic:  2740 on 1 and 175487 DF,  p-value: < 2.2e-16

Sin embargo esto no cumple con los supuestos bÃ¡sicos de un modelo de regresÃ³n lineal.

Modelo mÃºltiple

lm.1<-lm(ING_X_HRS ~ ANIOS_ESC + EDA + as_label(SEX) , data=base_modelos)
summary(lm.1)

## 
## Call:
## lm(formula = ING_X_HRS ~ ANIOS_ESC + EDA + as_label(SEX), data = base_modelos)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -53.4  -24.5   -4.3    9.9 3710.2 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.663622   0.490803  11.539   <2e-16 ***
## ANIOS_ESC           1.508118   0.026387  57.154   <2e-16 ***
## EDA                 0.189236   0.008071  23.447   <2e-16 ***
## as_label(SEX)Mujer -2.026740   0.225073  -9.005   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.09 on 175485 degrees of freedom
## Multiple R-squared:  0.01886,    Adjusted R-squared:  0.01885 
## F-statistic:  1125 on 3 and 175485 DF,  p-value: < 2.2e-16

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

stargazer(lm.0, lm.1, type = 'text', header=FALSE)

## 
## ===============================================================================
##                                         Dependent variable:                    
##                     -----------------------------------------------------------
##                                              ING_X_HRS                         
##                                  (1)                           (2)             
## -------------------------------------------------------------------------------
## ANIOS_ESC                     1.334***                      1.508***           
##                                (0.025)                       (0.026)           
##                                                                                
## EDA                                                         0.189***           
##                                                              (0.008)           
##                                                                                
## as_label(SEX)Mujer                                          -2.027***          
##                                                              (0.225)           
##                                                                                
## Constant                      14.186***                     5.664***           
##                                (0.289)                       (0.491)           
##                                                                                
## -------------------------------------------------------------------------------
## Observations                   175,489                       175,489           
## R2                              0.015                         0.019            
## Adjusted R2                     0.015                         0.019            
## Residual Std. Error     46.173 (df = 175487)          46.091 (df = 175485)     
## F Statistic         2,739.707*** (df = 1; 175487) 1,124.634*** (df = 3; 175485)
## ===============================================================================
## Note:                                               *p<0.1; **p<0.05; ***p<0.01

Ejercicio

Haga un resumen de las variables "ING_X_HRS", "INGOCUP", "HRSOCUP", segÃºn el tamaÃ±o de localidad (T_LOC). - Pista: reutilice el cÃ³digo que hicimos para la consulta de sexo, pero cambie la variable
Haga un grÃ¡fico de densidad del logaritmo de los ingresos por hora segÃºn el tamaÃ±o de localidad
Al modelo de regresÃ³n lineal, incluya la variable "T_LOC". GuÃ¡rdelo en un objeto llamado "lm.2". Cheque que esta variable no estÃ¡ en "mydata". AsÃ? que va a tener que incluir esta variable en un seleccÃ³n de la base de datos.
Haga una tabla de los tres modelos calculados, compare.