/l — title: Methods of Policy Analysis - The Future of Work subtitle: Quantitative HW Assignment - Blinder-Oaxaca author: Professor Lee Branstetter date: Spring 2026 output: pdf_document —
This quantitative HW assignment will require you to use statistical analysis software to estimate the Blinder-Oaxaca decomposition.
You will (once again) be using data from the Current Population Survey (CPS) “outgoing rotation files”, which are surveys undertaken in the final months in which a U.S. household is part of the CPS sample. These surveys record data on schooling, hourly wages, age, location, and respondent demographics. These data therefore tell us how wages vary according to education, experience, race, gender, and geography.
Two versions of this CPS data extract can be found on the class website. The first version is HW3_Data.dta, saved in STATA format for those who are familiar with the STATA statistical software package and wish to use it to complete the homework assignment. The second version is HW3_Data.csv, saved as a text file in CSV format. You should be able to read this version into R or any other statistical software package.
This dataset has already been “preprocessed” from the original raw data files to make this assignment more straightforward. Here are the data it contains:
| VARIABLE NAME | DEFINITION |
|---|---|
| IDENTIFIER | Anonymous ID code that identifies the individual respondent |
| WAGE | Estimated hourly wage, based on survey responses |
| ED | Estimated years of education based on survey responses |
| EXP | Estimated years of work experience: \(Age-Schooling-6\) |
| EXP2 | The square of EXP |
| FE | Respondent is female |
| BLACK | Respondent is Black |
| HISP | Respondent is Hispanic |
| WHITE | Respondent is White |
| OTHER | Resopndent is not White, Black, or Hispanic |
| NOHS | Respondent did not complete high school |
| POSTHS | Respondent received some post high school training |
| BACHELOR | Respondent completed a 4-year college degree |
| POSTGRAD | Respondent completed a degree beyond the 4-year bachelors’ |
Recall that one important feature of the wage data reported in the publicly disclosed CPS files is that it is “topcoded” - that is, the wages of the highest income earners are not publicly disclosed. This is done in order to protect the privacy of high income individuals whose identity might otherwise be deduced. Instead of recording the actual income of these respondents, the publicly disclosed data files record the average income of all high earning individuals.
# packages
library(tidyverse)
library(car) # for test of statistical equivalence
library(gap) # for chow test (may need to install)
## Warning: package 'gap' was built under R version 4.4.3
# suppress scientific notation
options(scipen = 4) # a handy bit of code to reduce scientific notation usage
# data
cps <- read_csv("HW3_Data.csv")
# 1.a) Mincer regression
mincer <- lm(LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP + OTHER , data=cps)
summary(mincer)
##
## Call:
## lm(formula = LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP + OTHER,
## data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5037 -0.3226 -0.0204 0.3139 2.9885
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.43840516 0.01498885 95.965 < 2e-16 ***
## ED 0.10161808 0.00093751 108.391 < 2e-16 ***
## EXP 0.02987870 0.00055036 54.289 < 2e-16 ***
## EXP2 -0.00047332 0.00001082 -43.727 < 2e-16 ***
## FE -0.22817271 0.00467635 -48.793 < 2e-16 ***
## BLACK -0.13966156 0.00780491 -17.894 < 2e-16 ***
## HISP -0.05685394 0.00704083 -8.075 6.90e-16 ***
## OTHER 0.03450051 0.00871777 3.957 7.58e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5201 on 49992 degrees of freedom
## Multiple R-squared: 0.2846, Adjusted R-squared: 0.2845
## F-statistic: 2841 on 7 and 49992 DF, p-value: < 2.2e-16
# 1.a) Extra credit
##1b b. [10 pts] Separate your data into two subsamples - one for Black workers and one for White workers. For each subsample answer the following: i. How many observations are there? ii. What are the sample means for variables \(LNWAGE, ED, EXP, EXP2\) and \(FE\). iii. What are the differences in sample means for variables \(LNWAGE, ED, EXP, EXP2\) and \(FE\). iv. What do these differences tell us about different earnings, education, and experience levels between Black and White workers? v. What is the difference in average earnings between Black males and Black females? vi. What is the difference in average earnings between White males and White females?
Black <- filter(cps, BLACK == 1)
White <- filter(cps, WHITE == 1)
[1] “Number of Black workers: 5167” [1] “Number of White workers: 33683”
# 1.b.ii) sample means
meanblack <- lapply(Black, mean)
meanwhite <- lapply(White, mean)
print("Sample means for Black observations:")
## [1] "Sample means for Black observations:"
print(meanblack[c("LNWAGE", "ED", "EXP", "EXP2", "FE")])
## $LNWAGE
## [1] 2.934341
##
## $ED
## [1] 14.07219
##
## $EXP
## [1] 22.03155
##
## $EXP2
## [1] 685.6116
##
## $FE
## [1] 0.5616412
print("Sample means for White observations")
## [1] "Sample means for White observations"
print(meanwhite[c("LNWAGE", "ED", "EXP", "EXP2", "FE")])
## $LNWAGE
## [1] 3.154221
##
## $ED
## [1] 14.65107
##
## $EXP
## [1] 23.03776
##
## $EXP2
## [1] 741.682
##
## $FE
## [1] 0.4833299
[1] “Differences are presented as White mean - Black mean” [1] “Difference in LN Wages: 0.219880036088417” [1] “Difference in Education: 0.578881381798643” [1] “Difference in Experience: 1.00621750529035” [1] “Difference in Experience ^2: 56.0704323127076” [1] “Difference in % of females: -0.0783113147725224”
# 1.b.iv) exposition (no additional code required)
print("These differences show us that on average, Black observations earn less in wages, have about about a half year less in education, about a year less in experience")
## [1] "These differences show us that on average, Black observations earn less in wages, have about about a half year less in education, about a year less in experience"
# 1.b.v) difference for Black males and females
black_fem_wage <- mean(Black$LNWAGE[Black$FE==1])
black_male_wage <- mean(Black$LNWAGE[Black$FE==0])
print(paste("Difference in average Black male and Black female ln wages:",round(black_male_wage - black_fem_wage, 3)))
## [1] "Difference in average Black male and Black female ln wages: 0.087"
# 1.b.vi) difference for White males and females
white_fem_wage <- mean(White$LNWAGE[White$FE==1])
white_male_wage <- mean(White$LNWAGE[White$FE==0])
print(paste("Difference in average white male and white female ln wages:",round(white_male_wage - white_fem_wage, 3)))
## [1] "Difference in average white male and white female ln wages: 0.211"
# Black subset
Black_reg <- lm(LNWAGE ~ ED + EXP + EXP2 + FE, data=Black)
summary(Black_reg)
##
## Call:
## lm(formula = LNWAGE ~ ED + EXP + EXP2 + FE, data = Black)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.08910 -0.30166 -0.03152 0.27593 2.82619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.25363126 0.04413973 28.401 <2e-16 ***
## ED 0.10352524 0.00295448 35.040 <2e-16 ***
## EXP 0.02523410 0.00161637 15.612 <2e-16 ***
## EXP2 -0.00037566 0.00003161 -11.886 <2e-16 ***
## FE -0.13266073 0.01370640 -9.679 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.487 on 5162 degrees of freedom
## Multiple R-squared: 0.2486, Adjusted R-squared: 0.248
## F-statistic: 427 on 4 and 5162 DF, p-value: < 2.2e-16
# White subset
white_reg <- lm(LNWAGE ~ ED + EXP + EXP2 + FE, data=White)
summary(white_reg)
##
## Call:
## lm(formula = LNWAGE ~ ED + EXP + EXP2 + FE, data = White)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5418 -0.3254 -0.0153 0.3208 2.6429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.33026285 0.01901841 69.95 <2e-16 ***
## ED 0.10908132 0.00123203 88.54 <2e-16 ***
## EXP 0.03116012 0.00068103 45.75 <2e-16 ***
## EXP2 -0.00049832 0.00001328 -37.54 <2e-16 ***
## FE -0.25338327 0.00574205 -44.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5246 on 33678 degrees of freedom
## Multiple R-squared: 0.2737, Adjusted R-squared: 0.2736
## F-statistic: 3173 on 4 and 33678 DF, p-value: < 2.2e-16
# 1.c.i) difference on education (no additional code required)
print("Holding all else constant, on average, a 1 year increase in education increases log wage by .103 log points for Black workers and .109 log points for white workers. This can be interpreted as when holding all else constant, the financial return of an additional year of schooling for white workers is larger than for Black workers")
## [1] "Holding all else constant, on average, a 1 year increase in education increases log wage by .103 log points for Black workers and .109 log points for white workers. This can be interpreted as when holding all else constant, the financial return of an additional year of schooling for white workers is larger than for Black workers"
# 1.c.ii) difference on gender by race (no additional code required)
print("The gender wage differential is larger for white observations than for Black observations. Therefore, holding all other factors constant, the gender wage differential is larger for white women compared to white men, than for Black women compared to Black men")
## [1] "The gender wage differential is larger for white observations than for Black observations. Therefore, holding all other factors constant, the gender wage differential is larger for white women compared to white men, than for Black women compared to Black men"
# 1.c.iii) difference on age profile by race (no additional code required)
print("The coefficient on experience is larger for white observations than for Black observations. Therefore, holding all else constant, aging an additional year has higher returns for white observations than Black observations")
## [1] "The coefficient on experience is larger for white observations than for Black observations. Therefore, holding all else constant, aging an additional year has higher returns for white observations than Black observations"
# 1.c.iv) extra credit: topcoding (no additional code required)
cat("Top coding obscures the returns for high income individuals. If the advantaged group was disproportionately within the observations that were top-coded. The coefficient estimations for the advantaged group would be an underestimate and therefore bias the differences in wages towards zero")
## Top coding obscures the returns for high income individuals. If the advantaged group was disproportionately within the observations that were top-coded. The coefficient estimations for the advantaged group would be an underestimate and therefore bias the differences in wages towards zero
# 1.c.v) chow test
# 1.d.i) Black-White endowment difference (weighted on White values)
white_weighted_endowment_term = diffBnWED*white_reg$coefficients["ED"] + diffBnWEXP*white_reg$coefficients["EXP"] + diffBnWEXP2*white_reg$coefficients["EXP2"]+diffBnWFE * white_reg$coefficients["FE"]
white_weighted_endowment_term
## ED
## 0.086401
# 1.d.ii) Discrimination component is the sum product of the diff in the coeffs and the avg for black endowments
coeff_diff_ed <- white_reg$coefficients["ED"]- Black_reg$coefficients["ED"]
coeff_diff_exp <- white_reg$coefficients["EXP"]- Black_reg$coefficients["EXP"]
coeff_diff_exp2 <- white_reg$coefficients["EXP2"]- Black_reg$coefficients["EXP2"]
coeff_diff_fe <- white_reg$coefficients["FE"] - Black_reg$coefficients["FE"]
white_weighted_discrimination_term <- coeff_diff_ed * meanblack$ED + coeff_diff_exp * meanblack$EXP + coeff_diff_exp2 * meanblack$EXP2 + coeff_diff_fe * meanblack$FE
cat("White weighted discrimination term:", round(white_weighted_discrimination_term,3), "\n")
## White weighted discrimination term: 0.057
cat("White weighted endowment term:", round(white_weighted_endowment_term,3), "\n")
## White weighted endowment term: 0.086
cat("Sum of White weighted endowment and discrimination:", round(sum(white_weighted_endowment_term,white_weighted_discrimination_term ),3), "\n")
## Sum of White weighted endowment and discrimination: 0.143
cat("Fraction of difference in average log wages due to discrimination:", round(abs(white_weighted_discrimination_term/diffBnWLNWAGE*100), 3), "%")
## Fraction of difference in average log wages due to discrimination: 25.854 %
# 1.d.iii) exposition (no additional code needed)
print("Through the implementation of the Blinder-Oaxaca test, differences in wages for Black and white workers are shown to be caused by discrimination and differences in endowments. The number from the BO-test is slightly larger than the differential shown in 1a because it accounts for differences in average endowments and discrimination whereas the coefficient found in 1a holds all other factors constant and assumes the same returns for the same endowments")
## [1] "Through the implementation of the Blinder-Oaxaca test, differences in wages for Black and white workers are shown to be caused by discrimination and differences in endowments. The number from the BO-test is slightly larger than the differential shown in 1a because it accounts for differences in average endowments and discrimination whereas the coefficient found in 1a holds all other factors constant and assumes the same returns for the same endowments"
# 1.e.i) endowment and discrimination (Black weighting)
black_weighted_endowment_term <- Black_reg$coefficients["ED"] * diffBnWED + Black_reg$coefficients["EXP"] * diffBnWEXP + Black_reg$coefficients["EXP2"] * diffBnWEXP2 + Black_reg$coefficients["FE"] * diffBnWFE
cat(paste("Black weighted endowment term:", round(black_weighted_endowment_term,3), "\n"))
## Black weighted endowment term: 0.075
black_weighted_discrimination_term <- meanwhite$ED * coeff_diff_ed + meanwhite$EXP * coeff_diff_exp + meanwhite$EXP2 * coeff_diff_exp2 + meanwhite$FE * coeff_diff_fe
cat(paste("Black weighted discrimination term:", round(black_weighted_discrimination_term,3), "\n"))
## Black weighted discrimination term: 0.069
cat(paste("Total log wage differential:", round(sum(black_weighted_discrimination_term,black_weighted_endowment_term),3)))
## Total log wage differential: 0.143
# 1.e.ii and 1.e.iii) exposition (no additional coding required)
cat("i.e.ii) When using equation 5.31 from the Berndt textbook, the endowment and discrimination terms result in the same total wage differential. However, the wage differential attributed to earnings is smaller and discrimination is larger.\n\n")
## i.e.ii) When using equation 5.31 from the Berndt textbook, the endowment and discrimination terms result in the same total wage differential. However, the wage differential attributed to earnings is smaller and discrimination is larger.
cat ("i.e.iii) The estimates are different because the coefficients and mean values for the advantaged group are typically larger than those for the disadvantaged group. Thus, when the discrimination term is estimated by multiplyingthe average endowments of the disadvantaged group to the differences in coefficients, we would expect to see a lower estimated value")
## i.e.iii) The estimates are different because the coefficients and mean values for the advantaged group are typically larger than those for the disadvantaged group. Thus, when the discrimination term is estimated by multiplyingthe average endowments of the disadvantaged group to the differences in coefficients, we would expect to see a lower estimated value
# 2.a.i) separate data
female <- filter(cps, FE == 1)
male <- filter(cps, FE == 0)
cat(paste("Number of female workers:", {nrow(female)}, "\n"))
## Number of female workers: 24453
cat(paste("Number of male workers:", {nrow(male)}))
## Number of male workers: 25547
# 2.a.ii) sample means
meanfemale <- lapply(female, mean)
meanmale <- lapply(male, mean)
print("Sample means for female observations:")
## [1] "Sample means for female observations:"
print(meanfemale[c("LNWAGE", "ED", "EXP", "EXP2", "FE")])
## $LNWAGE
## [1] 3.001255
##
## $ED
## [1] 14.55613
##
## $EXP
## [1] 22.35869
##
## $EXP2
## [1] 707.3425
##
## $FE
## [1] 1
print("Sample means for Male observations")
## [1] "Sample means for Male observations"
print(meanmale[c("LNWAGE", "ED", "EXP", "EXP2", "FE")])
## $LNWAGE
## [1] 3.189151
##
## $ED
## [1] 14.12088
##
## $EXP
## [1] 22.40987
##
## $EXP2
## [1] 708.7394
##
## $FE
## [1] 0
# 2.a.iii differences in male minus female (nonnegative values)
diffMmFLNWAGE <- mean(male$LNWAGE) - mean(female$LNWAGE)
diffMmFED <- mean(male$ED) - mean(female$ED)
diffMmFEXP <- mean(male$EXP) - mean(female$EXP)
diffMmFEXP2 <- mean(male$EXP2) - mean(female$EXP2)
diffMmFBLACK <- mean(male$BLACK) - mean(female$BLACK)
diffMmFHISP <- mean(male$HISP) - mean(female$HISP)
cat(paste("Diff in LN Wage:",diffMmFLNWAGE, "\n"))
## Diff in LN Wage: 0.187895950251396
cat(paste("Diff in ED:", diffMmFED, "\n"))
## Diff in ED: -0.435252832903808
cat(paste("Diff in EXP:", diffMmFEXP, "\n"))
## Diff in EXP: 0.0511839050960958
cat(paste("Diff in EXP2:", diffMmFEXP2, "\n"))
## Diff in EXP2: 1.39692944618025
cat(paste("Diff in BLACK:", diffMmFBLACK, "\n"))
## Diff in BLACK: -0.0300165283446872
cat(paste ("Diff in HISP:", diffMmFHISP))
## Diff in HISP: 0.0176246311172031
# 2.a.iv) interpretation (no additional code required)
cat("2.a.iv) On average, men in this sample are compensated more and have more experience, but they tend to be less educated.")
## 2.a.iv) On average, men in this sample are compensated more and have more experience, but they tend to be less educated.
# 2.a.v) in numbers of workers by race
cat("2.a.v) On average, compared to the females in the sample, the men in the sample are more likely to be Hispanic and less likely to be Black")
## 2.a.v) On average, compared to the females in the sample, the men in the sample are more likely to be Hispanic and less likely to be Black
# female subset
lmFemale <- lm(LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP, data=female)
summary(lmFemale)
##
## Call:
## lm(formula = LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP, data = female)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2894 -0.3074 -0.0239 0.3002 3.0199
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.19625708 0.02155332 55.502 < 2e-16 ***
## ED 0.10765793 0.00133934 80.382 < 2e-16 ***
## EXP 0.02310092 0.00076097 30.357 < 2e-16 ***
## EXP2 -0.00037147 0.00001507 -24.643 < 2e-16 ***
## FE NA NA NA NA
## BLACK -0.08978093 0.01011788 -8.873 < 2e-16 ***
## HISP -0.03860858 0.00990139 -3.899 0.0000967 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5043 on 24447 degrees of freedom
## Multiple R-squared: 0.264, Adjusted R-squared: 0.2638
## F-statistic: 1753 on 5 and 24447 DF, p-value: < 2.2e-16
# male subset
lmMale <- lm(LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP, data=male)
summary(lmMale)
##
## Call:
## lm(formula = LNWAGE ~ ED + EXP + EXP2 + FE + BLACK + HISP, data = male)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5244 -0.3334 -0.0180 0.3232 2.7701
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.44925686 0.02072869 69.915 <2e-16 ***
## ED 0.09620419 0.00130669 73.624 <2e-16 ***
## EXP 0.03649746 0.00079002 46.198 <2e-16 ***
## EXP2 -0.00057247 0.00001544 -37.082 <2e-16 ***
## FE NA NA NA NA
## BLACK -0.20660708 0.01184781 -17.438 <2e-16 ***
## HISP -0.08201122 0.00979587 -8.372 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.532 on 25541 degrees of freedom
## Multiple R-squared: 0.2782, Adjusted R-squared: 0.278
## F-statistic: 1968 on 5 and 25541 DF, p-value: < 2.2e-16
# 2.b.i) difference on education (no additional code required)
cat("Female ed coefficient:", lmFemale$coefficients["ED"], "\n")
## Female ed coefficient: 0.1076579
cat("Male ed coefficient:", lmMale$coefficients["ED"], "\n")
## Male ed coefficient: 0.09620419
cat("On average, holding all else constant, an additional year of schooling is associated with higher returns for women than for men")
## On average, holding all else constant, an additional year of schooling is associated with higher returns for women than for men
# 2.b.ii) difference on race by gender (no additional code required)
cat("Female BLACK coefficient:", lmFemale$coefficients["BLACK"], "\n")
## Female BLACK coefficient: -0.08978093
cat("Male BLACK coefficient:", lmMale$coefficients["BLACK"], "\n")
## Male BLACK coefficient: -0.2066071
cat("On average, holding all else constant, Black men have a greater wage differential compared to their white counterparts, than Black women do to their white counterparts\n\n")
## On average, holding all else constant, Black men have a greater wage differential compared to their white counterparts, than Black women do to their white counterparts
cat("Female HISP coefficient:", lmFemale$coefficients["HISP"], "\n")
## Female HISP coefficient: -0.03860858
cat("Male HISP coefficient:", lmMale$coefficients["HISP"], "\n")
## Male HISP coefficient: -0.08201122
cat("On average, holding all else constant, Hispanic men have a greater wage differential compared to their white counterparts, than Hispanic women do to their white counterparts")
## On average, holding all else constant, Hispanic men have a greater wage differential compared to their white counterparts, than Hispanic women do to their white counterparts
# 2.b.iii) difference on age profile by race (no additional code required)
cat("Female EXP coefficient:", lmFemale$coefficients["EXP"], "\n")
## Female EXP coefficient: 0.02310092
cat("Male EXP coefficient:", lmMale$coefficients["EXP"], "\n")
## Male EXP coefficient: 0.03649746
cat("On average, holding all else constant, Hispanic men have a greater wage differential compared to their white counterparts, than Hispanic women do to their white counterparts")
## On average, holding all else constant, Hispanic men have a greater wage differential compared to their white counterparts, than Hispanic women do to their white counterparts
#options(download.file.method = "wininet")
#tinytex::install_tinytex()
# 2.b.iv) chow test
y_male <- male$LNWAGE
y_female <- female$LNWAGE
x_male <- cbind(male$ED, male$EXP, male$EXP2, male$BLACK, male$HISP)
x_female <- cbind(female$ED, female$EXP, female$EXP2, female$BLACK, female$HISP)
chow.test(y_male, x_male, y_female, x_female)
## F value d.f.1 d.f.2 P value
## 449.3554 6.0000 49988.0000 0.0000
cat("\nTherefore we can reject the null hypothesis")
##
## Therefore we can reject the null hypothesis
# 2.c.i) female-male endowment difference (weighted on male)
# vector needed: coefficients from regression on male subset (already created)
male_coefficients = lmMale$coefficients[-1] #remove the intercept
male_coefficients = male_coefficients[!is.na(male_coefficients)] #remove the NA value for female
# vector needed: differences of averages (male minus female)
mean_differences <- c(diffMmFED, diffMmFEXP, diffMmFEXP2, diffMmFBLACK, diffMmFHISP)
endowment_term_male_weighted <- round(sum(mean_differences * male_coefficients),3)
# calculating endowment amount
cat("Endowment term (male weighted):", endowment_term_male_weighted)
## Endowment term (male weighted): -0.036
# 2.c.ii) female-male discrimination difference (weighted on male)
#vector of female average values
meanfemale_clean <- as.numeric(meanfemale[c("ED", "EXP", "EXP2", "BLACK", "HISP")])
#vector of diff in coefficient values (M-F)
coeff_diff_MmF <- lmMale$coefficients - lmFemale$coefficients
coeff_diff_MmF <- coeff_diff_MmF[-1]
coeff_diff_MmF = as.numeric(coeff_diff_MmF[!is.na(coeff_diff_MmF)])
intercept_diff_MmF <- lmMale$coefficients[1]- lmFemale$coefficients[1]
discrimination_term_male_weighted <- sum(round(sum(meanfemale_clean * coeff_diff_MmF), 3), intercept_diff_MmF)
#calculating discrimination amount
cat("Discrimination term (male weighted):", discrimination_term_male_weighted, "\n")
## Discrimination term (male weighted): 0.2239998
#Calculating fraction of differential due to discrimination
cat("Fraction of wage differential due to discrimination:", round(discrimination_term_male_weighted / diffMmFLNWAGE, 3)*100, "%\n")
## Fraction of wage differential due to discrimination: 119.2 %
#Exposition
cat("This number is smaller than the coefficient for the dummy variable female")
## This number is smaller than the coefficient for the dummy variable female
# 2.c.iii) exposition (no additional code needed)
cat("This slightly smaller estimate is likely due to the Mincer regression controlling for differences in endowments.")
## This slightly smaller estimate is likely due to the Mincer regression controlling for differences in endowments.
# 2.d.i) female-male endowment and discrimination components (weighted on female)
cat("Female weighted estimations\n-------------------------------\n")
## Female weighted estimations
## -------------------------------
female_coefficients = lmFemale$coefficients[-1] #remove the intercept
female_coefficients = female_coefficients[!is.na(female_coefficients)] #remove the NA value for female
# calculating endowment amount
MF_endowment_female_weight <- sum(mean_differences * female_coefficients)
cat("Endowment term:", MF_endowment_female_weight, "\n")
## Endowment term: -0.04418049
#vector of male average values
meanmale_clean <- as.numeric(meanmale[c("ED", "EXP", "EXP2", "BLACK", "HISP")])
#calculating discrimination amount
discrimination_term_female_weighted = round(sum(meanmale_clean * coeff_diff_MmF)+ intercept_diff_MmF, 3)
cat("Discrimination term:", discrimination_term_female_weighted, "\n")
## Discrimination term: 0.232
cat("Total difference:", sum(discrimination_term_female_weighted, MF_endowment_female_weight))
## Total difference: 0.1878195
# 2.d.ii and 2.d.iii) exposition (no additional code needed)
cat("2.d.ii) This estimate for the differences in ln wages to women due to discrimination is slightly larger than the previous estimation. The estimate for differences due to endowments has a greater negative effect than the previous estimation.\n\n")
## 2.d.ii) This estimate for the differences in ln wages to women due to discrimination is slightly larger than the previous estimation. The estimate for differences due to endowments has a greater negative effect than the previous estimation.
cat("2.d.iii) These findings are different. Specifically in terms of discrimination, I was surprised the female weighted discrimination term was larger, because females are slightly more endowed than males in certain categories such as education, so I would expect their average values multiplied by the difference in coefficients to produce a higher level of discrimination")
## 2.d.iii) These findings are different. Specifically in terms of discrimination, I was surprised the female weighted discrimination term was larger, because females are slightly more endowed than males in certain categories such as education, so I would expect their average values multiplied by the difference in coefficients to produce a higher level of discrimination