As part of the final project, the ENOE database will be analyzed, so, we need the next bookstore:
library(haven)
library(car)
library(stargazer)
library(lmtest)
SDEMT118 <- read_dta("C:/Users/HP 425/Downloads/SDEMT118.dta")
View(SDEMT118)
We have the next model:
ingocup = B0 + B1mujer + B2eda + B3eda2 + B4aniosesc + B5casado + B6hrsocup + u
With the independent variables:
table(SDEMT118$sex) # sex
table(SDEMT118$eda) # age
table(SDEMT118$anios_esc) # years of school
table(SDEMT118$e_con) # civil status
table(SDEMT118$hrsocup) # hours worked per week
And the dependent variable:
summary(SDEMT118$ingocup) # monthly income
Deleted rows:
SDEMT118 <- SDEMT118[SDEMT118$eda > 18,] # only adults
SDEMT118 <- SDEMT118[SDEMT118$ingocup > 0,] # only income higher than 0
SDEMT118 <- SDEMT118[SDEMT118$anios_esc != 99,] # years of school (no disponible information)
Redecodified variables:
SDEMT118$mujer <- recode(SDEMT118$sex, "1=0; 2=1") # mujer=1; 0=hombre
table(SDEMT118$mujer)
SDEMT118$casado <- recode(SDEMT118$e_con, "5=1; 1=0; 2=0; 3=0; 4=0; 6=0; 9=0") # mujer=1; 0= hombre
table(SDEMT118$casado)
stargazer(reg, type = 'text')
=================================================
Dependent variable:
-----------------------------
log(ingocup)
-------------------------------------------------
mujer -0.283***
(0.004)
eda 0.040***
(0.001)
I(eda2) -0.0004***
(0.00001)
anios_esc 0.075***
(0.0005)
casado 0.038***
(0.004)
hrsocup 0.011***
(0.0001)
Constant 6.540***
(0.017)
-------------------------------------------------
Observations 112,660
R2 0.301
Adjusted R2 0.301
Residual Std. Error 0.641 (df = 112653)
F Statistic 8,081.319*** (df = 6; 112653)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
The coefficients are next: -0.283mujer + 0.40eda - 0.0004eda2 + 0.075anios_esc + 0.038casado + 0.011hrsocup
This means:
-Monthly income for woman is lower in 28.3% than a man
-Young people have 40% more income than older people
-One more year of education equals 7.5% more income per month
-Someone who´s married has a monthly income higher in a 3.8% than a single person
-A person who works more hours a week has a slightly higher income (1.1%) per month
We want to know if our independent variables (female, eda, eda2, anios_esc, married and hrsocup) have an impact on our dependent variable (ingocup).
The coefficient in women and eda2 is negative, this tells us that: a woman has a lower income. The coefficient of eda has a favorable sign as well as anios_esc, married and hrsocup. But to convince ourselves of that effect, we’re going to perform a t test.
We want to use the data to prove:
The null hypothesis that the variable “female” has no effect on the monthly income in pesos against the alternative that the variable female has an effect on income. The null hypothesis is H0: b (female) = <0, and the alternative is H1: b (female)> 0.
The null hypothesis that the “eda” has no effect on the monthly income in pesos against the alternative that the age variable has an effect on income. Then the null hypothesis is H0: b (eda) = 0, and the alternative is H1: b (eda)> 0.
The null hypothesis that the “eda2” has no effect on the monthly income in pesos against the alternative that the variable eda2 has an effect on income. Then the null hypothesis is H0: b (eda2) = 0, and the alternative is H1: b (eda2)> 0.
The null hypothesis that “the years of schooling” do not have an estimate in the monthly income in pesos against the alternative that the years of schooling have an effect on income. Then the null hypothesis is H0: B (anios_esc) = 0, and the alternative is H1: b (anios_esc)> 0.
The null hypothesis that marital status “married” has no effect on the monthly income in pesos against the alternative that the married situation has an effect on income. Then the null hypothesis is H0: B (married) = 0, and the alternative is H1: b (married)> 0.
The null hypothesis that the “hours worked per week” do not have an estimate in the monthly income in pesos against the alternative that the hours worked per week have an effect on income. Then the null hypothesis is H0: B (hrsocup) = 0, and the alternative is H1: b (hrsocup)> 0.
Hypothesis
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; the t-statistic in female must be less than 1.65 at the 5% level. The t statistic in female is -0.283 / 0.004 = - 70.75, which is higher than -1.65, therefore the null hypothesis is rejected, at a level of significance of 5%.
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; The statistic in eda should be less than 1.65 at the 5% level. The t statistic in eda is 0.040 / 0.001 = 40, which is higher than 1.65, therefore the null hypothesis is rejected, at a level of significance of 5%.
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; the statistic t in I (eda2) must be less than 1.65 at the 5% level. The statistic t in I (eda2) is -0.0004 / 0.00001 = - 40, which is higher than -1.65 therefore the null hypothesis is rejected, at a level of significance of 5%.
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; the t statistic in anios_esc must be less than 1.65 at the 5% level. The t statistic in anios_esc is 0.075 / 0.0005 = 150, which is higher than 1.65, therefore the null hypothesis is rejected, at a level of significance of 5%.
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; The statistic t in married should be less than 1.65 at the 5% level. The statistic t in married is 0.038 / 0.004 = 9.5, which is higher than 1.65, therefore the null hypothesis is rejected, at a level of significance of 5%.
Since N - K - 1 = 112.653 at the 5% level, the critical value is 1.65; the t statistic in hrsocup must be less than 1.65 at the 5% level. The t statistic in hrsocup is 0.011 / 0.0001 = 110, which is higher than 1.65, therefore the null hypothesis is rejected, at a level of significance of 5%.
bptest(reg)
studentized Breusch-Pagan test
data: reg
BP = 4151.9, df = 6, p-value < 2.2e-16
-If p-value < 0.05; then null hypothesis is rejected, we have heteroscedasticity
-If p-value > 0.05; then we fail to reject the null hypothesis, we have homoscedasticity
So, in this case 2.2e-16 is smaller than 0.05, the null hypothesis is reject, we have heteroscedasticity
bptest(reg, ~ fitted(reg) + I(fitted(reg)^2))
studentized Breusch-Pagan test
data: reg
BP = 2774.6, df = 2, p-value < 2.2e-16
-If p-value < 0.05; then null hypothesis is rejected, we have heteroscedasticity
-If p-value > 0.05; then we fail to reject the null hypothesis, we have homoscedasticity
So, like Breusch Pagan test, here we can observe 2.2e-16 is smaller than 0.05, the null hypothesis is reject, we have heteroscedasticity
coeftest(reg, vcov = hccm(reg, type="hc0"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.54015639 0.01919766 340.6746 < 2.2e-16 ***
mujer -0.28333031 0.00411492 -68.8544 < 2.2e-16 ***
eda 0.03991880 0.00090668 44.0274 < 2.2e-16 ***
I(eda^2) -0.00041783 0.00001079 -38.7245 < 2.2e-16 ***
anios_esc 0.07457117 0.00052620 141.7169 < 2.2e-16 ***
casado 0.03820120 0.00421374 9.0659 < 2.2e-16 ***
hrsocup 0.01123985 0.00013674 82.1986 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Now, we create three dummy variables (married-female, single-man, and single-female) and remove the mujer and casado from the original equation.
marrfem <- as.numeric(SDEMT118$mujer==1 & SDEMT118$casado==1)
singman <- as.numeric(SDEMT118$mujer==0 & SDEMT118$casado==0)
singfem <- as.numeric(SDEMT118$mujer==1 & SDEMT118$casado==0)
stargazer(reg2, type = 'text')
=================================================
Dependent variable:
-----------------------------
log(ingocup)
-------------------------------------------------
marrfem -0.352***
(0.006)
singman -0.085***
(0.005)
singfem -0.319***
(0.005)
eda 0.040***
(0.001)
I(eda2) -0.0004***
(0.00001)
anios_esc 0.074***
(0.0005)
hrsocup 0.011***
(0.0001)
Constant 6.617***
(0.019)
-------------------------------------------------
Observations 112,660
R2 0.302
Adjusted R2 0.302
Residual Std. Error 0.641 (df = 112652)
F Statistic 6,971.641*** (df = 7; 112652)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
We have the results:
log(ingocup) = - 0.352marrfem - 0.085singman - 0.319singfem + 0.40eda - 0.0004eda2 + 0.075anios_esc + 0.011hrsocup
This means:
-Woman who is married has a lower monthly income (35.2%) than a married man
-A man who isn´t married has a lower monthly income (8.5%) than a married man
-A woman who isn´t married has a lower monthly income (31.9%) than a married man
According to the economic theory, yes, we consider that the first regression model has problems of redundant variables, the second model classifies better the relationship between the civil status, the gender and the impact they have on monthly income of the people.