INTRODUCTION
I installed some packages some to make my estimations. Then I loaded data base.
library(haven)
library(car)
library(stargazer)
library(lmtest)
SDEMT118 <- read_dta(file.choose())
I have deleted some rows.
SDEMT118 <- SDEMT118[SDEMT118$eda > 18,] # solo mayores de edad
SDEMT118 <- SDEMT118[SDEMT118$ingocup > 0,] # solo ingresos mayores a cero
SDEMT118 <- SDEMT118[SDEMT118$anios_esc != 99,] # años de escolaridad distintos a 99 (informacion no disponible)
Encoded some variables to make regression easier.
SDEMT118$mujer <- recode(SDEMT118$sex, "1=0; 2=1") # mujer=1; 0=hombre
table(SDEMT118$mujer)
0 1
67727 44933
SDEMT118$casado <- recode(SDEMT118$e_con, "5=1; 1=0; 2=0; 3=0; 4=0; 6=0; 9=0") # mujer=1; 0= hombre
table(SDEMT118$casado)
0 1
61811 50849
The regression running focusing on important variables.After I summarized with “stargazer” the regression in order for interpret the results more easily.
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changed
=================================================
Dependent variable:
-----------------------------
ingocup
-------------------------------------------------
mujer -1,674.128***
(33.637)
eda 225.596***
(6.501)
I(eda2) -1.897***
(0.073)
anios_esc 532.111***
(3.877)
casado 400.821***
(33.925)
hrsocup 38.751***
(0.908)
Constant -5,614.665***
(143.866)
-------------------------------------------------
Observations 112,660
R2 0.186
Adjusted R2 0.186
Residual Std. Error 5,289.041 (df = 112653)
F Statistic 4,285.675*** (df = 6; 112653)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
0.5 % 99.5 %
(Intercept) -5985.245723 -5244.083605
mujer -1760.773857 -1587.482721
eda 208.851245 242.340426
I(eda^2) -2.085561 -1.708883
anios_esc 522.125467 542.096777
casado 313.435587 488.207189
hrsocup 36.410549 41.090605
Making an Hipothesis with our model
attach(SDEMT118)
hist(log(ingocup), ylim = c(0, 40000), col = 2:16, main = "Monthly income")
I can assume that each variable is very significant at the significance level of 5%.
Then I created a graph (histogram) of the “ingocup” variable.
studentized Breusch-Pagan test
data: reg
BP = 697.06, df = 6, p-value < 2.2e-16
Doing a Breusch-Pagan test and a White test and interpreting the result.
Breusch-Pegan test
bptest(reg, ~fitted(reg) + I(fitted(reg)^2))
studentized Breusch-Pagan test
data: reg
BP = 1110.1, df = 2, p-value < 2.2e-16
White test
reg3 <- coeftest(reg, hccm)
reg3
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.6147e+03 1.4484e+02 -38.766 < 2.2e-16 ***
mujer -1.6741e+03 3.2742e+01 -51.130 < 2.2e-16 ***
eda 2.2560e+02 6.0300e+00 37.412 < 2.2e-16 ***
I(eda^2) -1.8972e+00 7.1811e-02 -26.420 < 2.2e-16 ***
anios_esc 5.3211e+02 6.0771e+00 87.560 < 2.2e-16 ***
casado 4.0082e+02 3.5058e+01 11.433 < 2.2e-16 ***
hrsocup 3.8751e+01 1.0034e+00 38.620 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
stargazer(reg, reg3, type = "text")
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changed
===============================================================
Dependent variable:
-------------------------------------------
ingocup
OLS coefficient
test
(1) (2)
---------------------------------------------------------------
mujer -1,674.128*** -1,674.128***
(33.637) (32.742)
eda 225.596*** 225.596***
(6.501) (6.030)
I(eda2) -1.897*** -1.897***
(0.073) (0.072)
anios_esc 532.111*** 532.111***
(3.877) (6.077)
casado 400.821*** 400.821***
(33.925) (35.058)
hrsocup 38.751*** 38.751***
(0.908) (1.003)
Constant -5,614.665*** -5,614.665***
(143.866) (144.836)
---------------------------------------------------------------
Observations 112,660
R2 0.186
Adjusted R2 0.186
Residual Std. Error 5,289.041 (df = 112653)
F Statistic 4,285.675*** (df = 6; 112653)
===============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
conclusion: There is heteroskedasticity given the fact that 2.2e-16 < 0.05 So I’m going to adjust using-robust estandar errors.
attach(SDEMT118)
The following objects are masked from SDEMT118 (pos = 3):
ageb, ambito1, ambito2, anios_esc, buscar5c, busqueda, c_inac5c, c_ocu11c, c_res,
casado, cd_a, clase1, clase2, clase3, con, cp_anoc, cs_ad_des, cs_ad_mot, cs_nr_mot,
cs_nr_ori, cs_p12, cs_p13_1, cs_p13_2, cs_p14_c, cs_p15, cs_p16, cs_p17, cs_p20_des,
cs_p22_des, d_ant_lab, d_cexp_est, d_sem, dispo, domestico, dur_des, dur_est, dur9c,
e_con, eda, eda12c, eda19c, eda5c, eda7c, emp_ppal, emple7c, ent, est, est_d, fac,
h_mud, hij5c, hrsocup, imssissste, ing_x_hrs, ing7c, ingocup, l_nac_c, loc, ma48me1sm,
medica5c, mh_col, mh_fil2, mujer, mun, n_ent, n_hij, n_hog, n_pro_viv, n_ren,
nac_anio, nac_dia, nac_mes, niv_ins, nodispo, p14apoyos, par_c, per, pnea_est,
pos_ocu, pre_asa, r_def, rama, rama_est1, rama_est2, remune2c, s_clasifi, salario,
scian, sec_ins, seg_soc, sex, sub_o, t_loc, t_tra, tcco, tip_con, tpg_p8a, trans_ppal,
tue_ppal, tue1, tue2, tue3, upm, ur, v_sel, zona
marrfem <- as.numeric(mujer==1 & casado ==1) #married female
singmal <- as.numeric(mujer==0 & casado==0) #single male
singfem <- as.numeric(mujer==1 & casado==0) #single female
reg2 <- lm(log(ingocup) ~ marrfem + singmal + singfem + eda + I(eda^2) + anios_esc + hrsocup)
stargazer(reg2, type = "text")
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changed
=================================================
Dependent variable:
-----------------------------
log(ingocup)
-------------------------------------------------
marrfem -0.352***
(0.006)
singmal -0.085***
(0.005)
singfem -0.319***
(0.005)
eda 0.040***
(0.001)
I(eda2) -0.0004***
(0.00001)
anios_esc 0.074***
(0.0005)
hrsocup 0.011***
(0.0001)
Constant 6.617***
(0.019)
-------------------------------------------------
Observations 112,660
R2 0.302
Adjusted R2 0.302
Residual Std. Error 0.641 (df = 112652)
F Statistic 6,971.641*** (df = 7; 112652)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
coeftest(reg2, vcov = hccm(reg2, type = "hc1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.6170e+00 2.0543e-02 322.110 < 2.2e-16 ***
marrfem -3.5198e-01 6.5972e-03 -53.353 < 2.2e-16 ***
singmal -8.5322e-02 5.0904e-03 -16.762 < 2.2e-16 ***
singfem -3.1894e-01 5.3188e-03 -59.964 < 2.2e-16 ***
eda 3.9727e-02 9.0720e-04 43.791 < 2.2e-16 ***
I(eda^2) -4.1828e-04 1.0797e-05 -38.742 < 2.2e-16 ***
anios_esc 7.4409e-02 5.2588e-04 141.494 < 2.2e-16 ***
hrsocup 1.1132e-02 1.3647e-04 81.573 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Creating three dummy variables to run the new regression and see what is the effect of being single female/male, and married female/male.
SDEMT118$job <- recode(SDEMT118$emp_ppal, "1=0; 2=1") # empleo informal=1; 0=empleo formal
reg2 <- lm(log(ingocup) ~ job + mujer + eda + I(eda^2) + anios_esc + casado + hrsocup, data=SDEMT118)
stargazer(reg2, type = "text")
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changed
=================================================
Dependent variable:
-----------------------------
log(ingocup)
-------------------------------------------------
job 0.400***
(0.004)
mujer -0.280***
(0.004)
eda 0.037***
(0.001)
I(eda2) -0.0004***
(0.00001)
anios_esc 0.057***
(0.0005)
casado 0.019***
(0.004)
hrsocup 0.009***
(0.0001)
Constant 6.678***
(0.017)
-------------------------------------------------
Observations 112,660
R2 0.357
Adjusted R2 0.357
Residual Std. Error 0.615 (df = 112652)
F Statistic 8,939.598*** (df = 7; 112652)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
CONCLUSION
Acoording to the economic theory, there are no redundant variables because I have specified to variables correctly, and much less there is perfect collinearity with independent variables.
I wondered if it’s possible that some other variables exist which might have an influence on monthly income, so according to the ENOE data, we took into account the informal and formal job variable (job). I made the regression, and according to this, the variable shows that for the simple fact of having a formal job, someone could earn 40% more that someone who doesn’t. I use dummy varibles for this regression too.
SDEMT118$job <- recode(SDEMT118$emp_ppal, "1=0; 2=1") # empleo informal=1; 0=empleo formal
reg2 <- lm(log(ingocup) ~ job + mujer + eda + I(eda^2) + anios_esc + casado + hrsocup, data=SDEMT118)
stargazer(reg2, type = "text")