JOYCE,MEI, CAROL,TAMI 11/1/2020
## The following object is masked _by_ .GlobalEnv:
##
## OWNCOST
use_varb <- (AGE >= 25) & (AGE <= 55) & (LABFORCE == 2) & (WKSWORK2 > 4) & (UHRSWORK >= 35)&(Asian=1)
dat_use <- subset(acs2017_ny,use_varb) #
detach()##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + female, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90258 -39251 -16422 12852 603267
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -86399.070 8408.739 -10.28 <2e-16 ***
## AGE 7641.924 431.330 17.72 <2e-16 ***
## I(AGE^2) -82.642 5.327 -15.51 <2e-16 ***
## female -18264.550 760.754 -24.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82030 on 46967 degrees of freedom
## Multiple R-squared: 0.02899, Adjusted R-squared: 0.02892
## F-statistic: 467.3 on 3 and 46967 DF, p-value: < 2.2e-16
## The following objects are masked _by_ .GlobalEnv:
##
## Asian, OWNCOST
NNobs <- length(INCWAGE)
set.seed(12345)
graph_obs <- (runif(NNobs) < 0.1)
dat_graph <-subset(dat_use,graph_obs)
a<-plot(INCWAGE ~ jitter(AGE, factor = 2), pch = 16, col = rgb(0.5, 0.7, 0.8, alpha = 0.8), ylim = c(0,150000), data = dat_graph)
to_be_predicted1 <- data.frame(AGE = 25:55, female = 1, AfAm = 0, Asian = 1, Amindian = 0, race_oth = 0, Hispanic =0, educ_hs = 0, educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
to_be_predicted1$yhat <- predict(model1, newdata = to_be_predicted1)
lines(yhat ~ AGE, data = to_be_predicted1) AGE, AGE^2,female and educ_college are all significant. 1. It’s necessary to add up AGE^2 to adjust the model.
2.INCWAGE= f(AGE)= -82.642AGE^2+7641.924 AGE-86399.070 -18264.550 female The function reaches its peach at f’(AGE)=0, -300.78AGE+13611.11=0 (AGE=46).
When D1=0 (male)
E(INCWAGE)= -82.642AGE^2+7641.924 AGE-86399.070
E(INCWAGE)max=f(46)=90248
When D1=1(female) E(INCWAGE)=-150.39AGE^2+ 13611.11AGE-179132.90 -22415.91female=67832.78
The wages of male is$22415.91 higher than female. Adding gender in the model change the intercept of the function.
## Estimating income With more dummies.改
model1a <-lm(INCWAGE~AGE+I(AGE^2)+female+Asian+educ_hs+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
summary(model1a)##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + female + Asian + educ_hs +
## educ_somecoll + educ_college + educ_advdeg, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -138710 -32970 -10697 13008 624344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.145e+05 8.101e+03 -14.135 < 2e-16 ***
## AGE 6.587e+03 4.051e+02 16.258 < 2e-16 ***
## I(AGE^2) -6.509e+01 5.003e+00 -13.012 < 2e-16 ***
## female -2.537e+04 7.182e+02 -35.325 < 2e-16 ***
## Asian -1.077e+03 1.194e+03 -0.902 0.367
## educ_hs 1.337e+04 1.787e+03 7.481 7.52e-14 ***
## educ_somecoll 2.576e+04 1.822e+03 14.141 < 2e-16 ***
## educ_college 6.149e+04 1.783e+03 34.493 < 2e-16 ***
## educ_advdeg 8.658e+04 1.823e+03 47.483 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76760 on 46962 degrees of freedom
## Multiple R-squared: 0.1499, Adjusted R-squared: 0.1498
## F-statistic: 1035 on 8 and 46962 DF, p-value: < 2.2e-16
Q:What is the predicted wage, from your model, for a few relevant cases? Do those seem reasonable? 1.E(INCWAGE)=-1.145e+05+ 6.587e+03AGE+ -6.509e+01AGE^2-2.537e+04female+ 1.337e+04 educ_hs + 2.576e+04 educ_somecoll+ 6.149e+04 educ_college+ 8.658e+04educ_advdeg.
2.Except for Asian,all other variables are significant. In another word, income is not relative to ethnicity, so just drop Asian in the model. By adding more variables, R squares increases by 14.9%, which is an ok result.
3.By using this model, we can estimate the marginal impacts of each variable holding other factors unchanged. For example, female earn $2.537e+04 less than male per year, people have advanced education has income $8.658e+04 higher than those who are in educ_ohs.
PS:My confusion is that, if I drop educ_nohs in the regression, then the coefficients of the rest of the education index are making comparision to educ_nohs?
## Adding higher polynomial terms of age
model2<-lm(INCWAGE~AGE+I(AGE^2)+I(AGE^3)+I(AGE^4)+female,data=dat_use)
summary(model2)##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + I(AGE^3) + I(AGE^4) +
## female, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89585 -39074 -16528 12946 603245
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.076e+05 2.022e+05 1.027 0.3046
## AGE -2.577e+04 2.138e+04 -1.205 0.2282
## I(AGE^2) 1.303e+03 8.303e+02 1.569 0.1167
## I(AGE^3) -2.484e+01 1.404e+01 -1.768 0.0770 .
## I(AGE^4) 1.626e-01 8.741e-02 1.860 0.0628 .
## female -1.827e+04 7.607e+02 -24.012 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82030 on 46965 degrees of freedom
## Multiple R-squared: 0.02913, Adjusted R-squared: 0.02903
## F-statistic: 281.8 on 5 and 46965 DF, p-value: < 2.2e-16
NNobs <- length(INCWAGE)
set.seed(12345)
graph_obs <- (runif(NNobs) < 0.1)
dat_graph <-subset(dat_use,graph_obs)
a<-plot(INCWAGE ~ jitter(AGE, factor = 2), pch = 16, col = rgb(0.5, 0.7, 0.8, alpha = 0.8), ylim = c(0,150000), data = dat_graph)
to_be_predicted1 <- data.frame(AGE = 25:55, female = 1, AfAm = 0, Asian = 1, Amindian = 0, race_oth = 0, Hispanic =0, educ_hs = 0, educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
to_be_predicted1$yhat <- predict(model2, newdata = to_be_predicted1)
lines(yhat ~ AGE, data = to_be_predicted1)1.At the significant level of 0.05, Age and its higher power terms and female are statistically significant.
3.By adding higher power of age, the model is improved as R squared increases.
model2a<-lm(INCWAGE~log(AGE)+I(log(AGE^2))+I(log(AGE^3))+I(log(AGE^4))+female,data=dat_use)
summary(model2a)##
## Call:
## lm(formula = INCWAGE ~ log(AGE) + I(log(AGE^2)) + I(log(AGE^3)) +
## I(log(AGE^4)) + female, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -94336 -39311 -16982 12904 594134
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -68030.3 5995.9 -11.35 <2e-16 ***
## log(AGE) 40517.3 1624.5 24.94 <2e-16 ***
## I(log(AGE^2)) NA NA NA NA
## I(log(AGE^3)) NA NA NA NA
## I(log(AGE^4)) NA NA NA NA
## female -18524.0 761.9 -24.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82180 on 46968 degrees of freedom
## Multiple R-squared: 0.02548, Adjusted R-squared: 0.02544
## F-statistic: 614 on 2 and 46968 DF, p-value: < 2.2e-16
##why take log for higher power of age dosen’t work??
## The following objects are masked _by_ .GlobalEnv:
##
## Asian, OWNCOST
## The following objects are masked from dat_NYC:
##
## AfAm, AGE, Amindian, ANCESTR1, ANCESTR1D, ANCESTR2, ANCESTR2D,
## Asian, below_150poverty, below_200poverty, below_povertyline, BPL,
## BPLD, BUILTYR2, CITIZEN, CLASSWKR, CLASSWKRD, Commute_bus,
## Commute_car, Commute_other, Commute_rail, Commute_subway, COSTELEC,
## COSTFUEL, COSTGAS, COSTWATR, DEGFIELD, DEGFIELD2, DEGFIELD2D,
## DEGFIELDD, DEPARTS, EDUC, educ_advdeg, educ_college, educ_hs,
## educ_nohs, educ_somecoll, EDUCD, EMPSTAT, EMPSTATD, FAMSIZE,
## female, foodstamps, FOODSTMP, FTOTINC, FUELHEAT, GQ,
## has_AnyHealthIns, has_PvtHealthIns, HCOVANY, HCOVPRIV, HHINCOME,
## Hisp_Cuban, Hisp_DomR, Hisp_Mex, Hisp_PR, HISPAN, HISPAND,
## Hispanic, in_Bronx, in_Brooklyn, in_Manhattan, in_Nassau, in_NYC,
## in_Queens, in_StatenI, in_Westchester, INCTOT, INCWAGE, IND,
## LABFORCE, LINGISOL, MARST, MIGCOUNTY1, MIGPLAC1, MIGPUMA1,
## MIGRATE1, MIGRATE1D, MORTGAGE, NCHILD, NCHLT5, OCC, OWNCOST,
## OWNERSHP, OWNERSHPD, POVERTY, PUMA, PWPUMA00, RACE, race_oth,
## RACED, RELATE, RELATED, RENT, ROOMS, SCHOOL, SEX, SSMC, TRANTIME,
## TRANWORK, UHRSWORK, UNITSSTR, unmarried, veteran, VETSTAT,
## VETSTATD, white, WKSWORK2, YRSUSA1
use_varb <- (AGE >= 25) & (AGE <= 55) & (LABFORCE == 2) & (WKSWORK2 > 4) & (UHRSWORK >= 35) & (Hispanic == 1) & (female == 1) & ((educ_college == 1) | (educ_advdeg == 1))
dat_use1 <- subset(acs2017_ny,use_varb) #
detach()## Explore what happens if we denote different properities under the same dummy variable as 1 and use them in the same model.
model2b <-lm(INCWAGE~AGE+I(AGE^2)+female+Asian+educ_hs+educ_somecoll+educ_college+educ_advdeg,data=dat_use1)
summary(model2b)##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + female + Asian + educ_hs +
## educ_somecoll + educ_college + educ_advdeg, data = dat_use1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -92859 -25796 -7006 15021 553413
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -77174.12 33798.14 -2.283 0.0226 *
## AGE 7794.92 1761.97 4.424 1.06e-05 ***
## I(AGE^2) -89.33 22.25 -4.015 6.34e-05 ***
## female NA NA NA NA
## Asian 12692.60 14470.50 0.877 0.3806
## educ_hs NA NA NA NA
## educ_somecoll NA NA NA NA
## educ_college -21888.58 3211.16 -6.816 1.52e-11 ***
## educ_advdeg NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51860 on 1131 degrees of freedom
## Multiple R-squared: 0.0779, Adjusted R-squared: 0.07464
## F-statistic: 23.89 on 4 and 1131 DF, p-value: < 2.2e-16
##In the original subset, educ_college and educ_advdeg are all denoted by 1. If a dummy variable has m properties, then we apply m-1 dummies in the model. Because different properties under the same dummy variables are exclusive. People will be classified by either of the property but not both. For example, “(educ_college == 1) | (educ_advdeg == 1)” if a person falls into educ_college or educ_advdge, then we fail to classify him.
Here in this example suppose we have 3 propertities in education index: educ_hs,educ_college, educ_advdge.
person educ_college educ_advdge
1: 0 0
2: 0 1
3: 1 0 4: 1 1
In this case, how should we classify person no.4? So D1(educ_college) and D2(educ_advdge) can’t take 1 at the same time.
##polynomial terms of dummy variables
model2c <-model1a <-lm(INCWAGE~AGE+I(AGE^2)+female+I(female^2)+Asian+I(Asian^2)+educ_hs+I(educ_hs^2)+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
summary(model2c)##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + female + I(female^2) +
## Asian + I(Asian^2) + educ_hs + I(educ_hs^2) + educ_somecoll +
## educ_college + educ_advdeg, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -138710 -32970 -10697 13008 624344
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.145e+05 8.101e+03 -14.135 < 2e-16 ***
## AGE 6.587e+03 4.051e+02 16.258 < 2e-16 ***
## I(AGE^2) -6.509e+01 5.003e+00 -13.012 < 2e-16 ***
## female -2.537e+04 7.182e+02 -35.325 < 2e-16 ***
## I(female^2) NA NA NA NA
## Asian -1.077e+03 1.194e+03 -0.902 0.367
## I(Asian^2) NA NA NA NA
## educ_hs 1.337e+04 1.787e+03 7.481 7.52e-14 ***
## I(educ_hs^2) NA NA NA NA
## educ_somecoll 2.576e+04 1.822e+03 14.141 < 2e-16 ***
## educ_college 6.149e+04 1.783e+03 34.493 < 2e-16 ***
## educ_advdeg 8.658e+04 1.823e+03 47.483 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76760 on 46962 degrees of freedom
## Multiple R-squared: 0.1499, Adjusted R-squared: 0.1498
## F-statistic: 1035 on 8 and 46962 DF, p-value: < 2.2e-16
##Dummy variables are denoted as 0 or 1. If we take log to 0 or 1, it is meaningless. The coefficients of those dummy variables just measure the percentage change when dummies switch from 0 to 1. Same as raising power of dunmmies. If we raise 0 and 1 to higher power, it doesn’t help to explain the model.
use_varb <- (AGE >= 25) & (AGE <= 55) & (LABFORCE == 2) & (WKSWORK2 > 4) & (UHRSWORK >= 35)&(Asian=1)
dat_use <- subset(acs2017_ny,use_varb) #
detach()
model2d<-lm(log1p(INCWAGE)~AGE+I(AGE^2)+female+Asian+educ_hs+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
summary(model2d)##
## Call:
## lm(formula = log1p(INCWAGE) ~ AGE + I(AGE^2) + female + Asian +
## educ_hs + educ_somecoll + educ_college + educ_advdeg, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.009 -4.076 1.810 3.267 16.053
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.215e-01 1.043e-01 -5.001 5.71e-07 ***
## AGE 3.212e-01 4.375e-03 73.419 < 2e-16 ***
## I(AGE^2) -4.029e-03 4.299e-05 -93.713 < 2e-16 ***
## female -6.694e-01 3.325e-02 -20.133 < 2e-16 ***
## Asian -2.619e-01 5.902e-02 -4.437 9.12e-06 ***
## educ_hs 1.840e+00 5.605e-02 32.819 < 2e-16 ***
## educ_somecoll 2.584e+00 6.053e-02 42.694 < 2e-16 ***
## educ_college 3.686e+00 6.187e-02 59.573 < 2e-16 ***
## educ_advdeg 4.129e+00 6.619e-02 62.385 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.369 on 69578 degrees of freedom
## (14237 observations deleted due to missingness)
## Multiple R-squared: 0.2825, Adjusted R-squared: 0.2824
## F-statistic: 3424 on 8 and 69578 DF, p-value: < 2.2e-16
1.If we take log of INCWAGE, we measure those variable in a percentage terms, eg. in model1a, people has some college education earn $2.576e+04 more than those who receive education less than high school(educ_ohs), while in model2d, it express as income of those who are in educ_somecoll is 128.95% higher than those in educ_ohs.
model3a:interaction of gender and age
model3a <-lm(INCWAGE~AGE+I(AGE^2)+female+I(female*AGE)+I(female*(AGE^2))+Asian+educ_hs+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
summary(model3a)##
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2) + female + I(female * AGE) +
## I(female * (AGE^2)) + Asian + educ_hs + educ_somecoll + educ_college +
## educ_advdeg, data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101281 -24550 -6536 11190 675329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.396e+04 2.026e+03 -36.496 < 2e-16 ***
## AGE 4.863e+03 8.902e+01 54.623 < 2e-16 ***
## I(AGE^2) -5.117e+01 8.927e-01 -57.326 < 2e-16 ***
## female 4.209e+04 2.748e+03 15.318 < 2e-16 ***
## I(female * AGE) -2.830e+03 1.190e+02 -23.772 < 2e-16 ***
## I(female * (AGE^2)) 2.916e+01 1.175e+00 24.810 < 2e-16 ***
## Asian -1.193e+03 8.172e+02 -1.460 0.144
## educ_hs 5.267e+03 7.763e+02 6.785 1.17e-11 ***
## educ_somecoll 1.178e+04 8.382e+02 14.049 < 2e-16 ***
## educ_college 3.652e+04 8.570e+02 42.618 < 2e-16 ***
## educ_advdeg 5.974e+04 9.171e+02 65.138 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 60490 on 69576 degrees of freedom
## (14237 observations deleted due to missingness)
## Multiple R-squared: 0.1776, Adjusted R-squared: 0.1775
## F-statistic: 1502 on 10 and 69576 DF, p-value: < 2.2e-16
Asian is insignificant. Income doesn’t has significant relation with ethnicity.
E(INCWAGE:gender*age)= -1.487e+05 +8.085e+03AGE-8.044e+01AGE^2-3.177e+03 female * AGE +3.245e+01 female * AGE^2+ 1.343e+04educ_hs+ 2.587e+04educ_somecoll + 8.641e+04educ_advdeg
2.Interpret model E(INCWAGE:GENDERAGE) interaction term: -3.177e+03 female * AGE +3.245e+01 female * AGE^2= femaleage(-3.177e+03+3.245e+01*age). This means when age increase 1 year, the wage changed will be -3.177e+03+3.245e+01*age.
E(INCWAGE:gender*age)= -1.487e+05 +8.085e+03AGE-8.044e+01AGE^2-3.177e+03 female * AGE+3.245e+01 female * AGE^2+ 8.641e+04educ_advdeg
the excepted value=E(INCWAGE:gender*age)=-1.487e+05 +8.085e+03(35)-8.044e+01(35)^2+3.245e+01 (35)^2+8.641e+04-3.177e+03 * (35)=
D1=gender
when D1=male=0 E(INCWAGE:gender*age)= -1.487e+05 +8.085e+03AGE-8.044e+01AGE^2+ 8.641e+04educ_advdeg
f’(AGE)=8.085e+03-8.044e+01*(2)AGE f’(age)=0 intercept1= -1.487e+05+04educ_advdeg
When D1=1 E(INCWAGE:gender*age)= -1.487e+05 +8.085e+03AGE-8.044e+01AGE^2-3.177e+03 female * AGE+3.245e+01 female * AGE^2+ 8.641e+04educ_advdeg
f’(AGE)=8.085e+03-8.044e+01*(2)AGE-3.177e+03+3.245e+01*(2)AGE+ 8.641e+04educ_advdeg INTERCEPT2= -1.487e+05+ 8.641e+04educ_advdeg
Conclusion: interaction of the shape of income~age of these two group are totally different(different intercepts and different slope).
## The following objects are masked _by_ .GlobalEnv:
##
## Asian, OWNCOST
use_varb <- (AGE >= 25) & (AGE <= 55) & (LABFORCE == 2) & (WKSWORK2 > 4) & (UHRSWORK >= 35)&(Asian=1)
dat_use <- subset(acs2017_ny,use_varb)
model3a<-lm(INCWAGE ~ AGE + I(AGE^2) + female + I(female*AGE) + I(female*(AGE^2))+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
NNobs <- length(INCWAGE)
set.seed(66666)
graph_obs <- (runif(NNobs) < 0.1)
dat_graph <-subset(dat_use,graph_obs)
plot(INCWAGE ~ jitter(AGE, factor = 2), pch = 16, col = rgb(0.5, 0.7, 0.8, alpha = 0.8), ylim = c(0,150000), data = dat_graph)
to_be_predicted6 <- data.frame(AGE = 25:55, female = 1, AfAm = 0, Asian =1, Amindian = 0, race_oth = 0, Hispanic =0, educ_hs = 0, educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
to_be_predicted6$yhat <- predict(model3a, newdata = to_be_predicted6)
lines(yhat ~ AGE, data = to_be_predicted6,col="red",lwd=2)
to_be_predicted7 <- data.frame(AGE = 25:55, female = 0, AfAm = 0, Asian =1, Amindian = 0, race_oth = 0, Hispanic =0, educ_hs = 0, educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
to_be_predicted7$yhat <- predict(model3a, newdata = to_be_predicted7)
lines(yhat ~ AGE, data = to_be_predicted7,col="blue",lty=3,lwd=4)
legend("topleft", c("male", "female"), lty = c(2,1), bty = "n")## gender/marriage statue interaction
model3b <-lm(log1p(INCWAGE)~AGE+I(AGE^2)+female:MARST+Asian+educ_hs+educ_somecoll+educ_college+educ_advdeg,data=dat_use)
summary(model3b)##
## Call:
## lm(formula = log1p(INCWAGE) ~ AGE + I(AGE^2) + female:MARST +
## Asian + educ_hs + educ_somecoll + educ_college + educ_advdeg,
## data = dat_use)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1731 -0.0265 0.3735 0.7737 3.8979
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2270851 0.2313890 35.555 < 2e-16 ***
## AGE 0.0615367 0.0115379 5.333 9.68e-08 ***
## I(AGE^2) -0.0006763 0.0001423 -4.754 2.00e-06 ***
## Asian -0.0784687 0.0338079 -2.321 0.0203 *
## educ_hs 0.5416555 0.0505947 10.706 < 2e-16 ***
## educ_somecoll 0.8280733 0.0515311 16.069 < 2e-16 ***
## educ_college 1.2834252 0.0504253 25.452 < 2e-16 ***
## educ_advdeg 1.5463282 0.0514825 30.036 < 2e-16 ***
## female:MARST -0.0241825 0.0047160 -5.128 2.94e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.173 on 46962 degrees of freedom
## Multiple R-squared: 0.03919, Adjusted R-squared: 0.03903
## F-statistic: 239.4 on 8 and 46962 DF, p-value: < 2.2e-16
The interaction term tells us that the income ration of female who are married spouse absent,separated ,divorced, widowed,never married earn 2.4% less than those not.