The original data set used for this project, CPS85, was taken in 1985 with the Current Population Survey (CPS), used to collect data in-between census years. The CPS85 data set contains information on wages (US dollars per hour) of both men and women and it includes variables such as years of education and work experience, race, sex, marriage status, age, sector of work, region of residence, and union membership. We used a subset of this data set, CPS2, that does not include an outlier from the original data set. For our project, our questions were:
Variable Analysis: Wage –> Numerical Sex –> Factor Sector –> Factor
To find and correct for the possible confounding factor work sector, I planned to use:
Variable Analysis: Education –> Numerical
To find and correct for the possible confounding factor education, I planned to use:
Variable Analysis: Experience –> Numerical
To find and correct for the possible confounding factor experience, I planned to use:
Variable Analysis: Marriage Status –> Factor
To find and correct for the possible confounding factor marriage status, I planned to use:
favstats(wage~sex,data=CPS2)
## sex min Q1 median Q3 max mean sd n missing
## 1 F 1.75 4.7175 6.735 9.8125 24.98 7.728770 4.102386 244 0
## 2 M 1.00 6.0000 8.930 13.0000 26.29 9.994913 5.285854 289 0
The amount of males and females in each sector, and the percents of males and females in each sector
sexSector<-xtabs(~sex+sector,data=CPS2)
sexSector
## sector
## sex clerical const manag manuf other prof sales service
## F 76 0 20 24 6 52 17 49
## M 21 20 34 44 62 53 21 34
rowPerc(sexSector)
## sector
## sex clerical const manag manuf other prof sales service Total
## F 31.15 0.00 8.20 9.84 2.46 21.31 6.97 20.08 100.00
## M 7.27 6.92 11.76 15.22 21.45 18.34 7.27 11.76 100.00
You can use favstats to look at the mean wage for each sector
favstats(wage~sector,data=CPS2)
## sector min Q1 median Q3 max mean sd n missing
## 1 clerical 3.00 5.2000 7.500 9.5000 15.03 7.422577 2.699018 97 0
## 2 const 3.75 7.2250 9.750 11.6275 15.00 9.502000 3.343877 20 0
## 3 manag 1.00 7.1250 10.620 15.8550 26.29 12.115185 6.244713 54 0
## 4 manuf 3.00 4.9250 6.750 9.8725 22.20 8.036029 4.117607 68 0
## 5 other 2.85 5.0000 6.940 10.8150 26.00 8.500588 4.601049 68 0
## 6 prof 4.35 7.5000 10.610 15.3800 24.98 11.947429 5.523833 105 0
## 7 sales 3.35 4.3125 5.725 10.8325 19.98 7.592632 4.232272 38 0
## 8 service 1.75 3.9650 5.500 8.0000 25.00 6.537470 3.673278 83 0
Here is a density plot showing wages for the different sector
densityplot(~wage|sector,data=CPS2,main="Wages, by Sector")
From the row percents and the density plots, it appears women do often choose different sectors than men, and we can see that some sectors have higher wages than others. The null hypothesis states this is why women have lower wages. In order to disprove the null hypothesis, we will have to compare the wages of men and women in the same sector.
Density plot comparing the wages of each sex for the different sectors:
densityplot(~wage|sector+sex,data=CPS2,main="Wages of Sectors, by Sex",xlab="Wages")
Looking at the density plot, you can clearly see that males make more than females is all sectors. The plots with females have most of the data clustered in one place. This means that most females make around the same wage. But for the males, the data is more spread out, and stretches farther- showing us that a lot of the males make a variety of wages, and usually more than the females.
modES<-lm(wage~sex+educ+sector,data=CPS2)
summary(modES)
##
## Call:
## lm(formula = wage ~ sex + educ + sector, data = CPS2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.5778 -2.7375 -0.6488 2.1090 16.2430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.57528 1.23998 0.464 0.643
## sexM 2.01743 0.41416 4.871 1.47e-06 ***
## educ 0.49548 0.09023 5.491 6.24e-08 ***
## sectorconst 1.38474 1.10722 1.251 0.212
## sectormanag 3.03940 0.75208 4.041 6.11e-05 ***
## sectormanuf 0.61040 0.71542 0.853 0.394
## sectorother 0.22762 0.74052 0.307 0.759
## sectorprof 2.60553 0.65207 3.996 7.37e-05 ***
## sectorsales -0.64304 0.82362 -0.781 0.435
## sectorservice -0.61294 0.65187 -0.940 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.242 on 523 degrees of freedom
## Multiple R-squared: 0.2657, Adjusted R-squared: 0.2531
## F-statistic: 21.03 on 9 and 523 DF, p-value: < 2.2e-16
This estimates that, despite confounding factors, men will make about two dollars an hour more than women, give or take 40 cents for standard error. This value is 4.87 standard errors above the value the null hypothesis would expect. The probability of getting these values, if the null is correct, is about 1.47e-06, or about 0.00000147 percent. We can reject the null in favor of the alternative.
We used xtabs to see if there was any correlation between gender and level of education.
xtabs(~sex+educ, data=CPS85)
## educ
## sex 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## F 0 0 0 0 2 1 9 5 5 11 110 16 23 6 34 14 9
## M 1 1 1 1 1 4 6 7 12 16 109 21 33 7 37 10 22
We crossed “sex” and “level of education” and saw that more men are out in the workforce with between two and seven years of education.
CPSfemale <- subset(CPS2, sex == "F")
CPSmale <- subset(CPS2, sex == "M")
modEducFemale <- lmGC(wage ~ educ, data = CPSfemale)
modEducMale <- lmGC(wage ~ educ, data=CPSmale)
Next, we utilized the predict model to compare mens’ and womens’ wages.
predict(modEducMale, x = 14)
## Predict wage is about 10.67,
## give or take 4.954 or so for chance variation.
predict(modEducFemale, x = 14)
## Predict wage is about 8.543,
## give or take 3.585 or so for chance variation.
We used the “modEducMale” and “modEducFemale” to predict mens’ and womens’ wages, which yielded some interesting results. The function showed that the predicted wage for men is always at least $2 higher than the womens’. However, the chance variation for mens’ wages was considerably higher than the womens’.
Then we used two histograms, one to graph the relationship between gender and education…
histogram(~educ | sex, data=CPS2)
The first histogram compares men and women based on their level of education, which shows that on average, women posses a higher level of education than men.
… and the other to graph wage vs. sex.
histogram(~wage | sex, data=CPS2)
The second histogram shows that more women make a wage of $5-$6 per hour compared to men. Also, a considerable number of men make more than $15 per hour, which can’t be said for the women.
Finally, we employed the use of a density plot to show the level of males and females, their level of education, and their wages.
densityplot(~wage | educ + sex, data=CPS2)
This density plot observes three different factors: sex, wage, and education. Our results showed us that more females have a higher level of education than males.
The first R chunk I made was to see the relationship between wage and experience. To see this I made an xy plot graph. Looking at the graph it shows that the more experience a person has, in this sample group, does not mean they will receive the highest wage. So wage and experience are not related.
xyplot(wage~exper,data=CPS2,
xlab="Experience",
ylab="Wage",
main="Wage and Experience Relationship")
Next I made R chunks to find the favstats relationship between sex and experience, then sex and wage. On average females had almost two years more of experience then males, specifically 1.94 years.
In the sex and wage relationship I looked at the mean for males and females. This showed a 2.27 wage difference. Males have an overall higher wage then females. Even though females had, on average, more experience.
favstats(exper~sex,data=CPS2)
## sex min Q1 median Q3 max mean sd n missing
## 1 F 0 9 16 28 49 18.90574 12.58663 244 0
## 2 M 0 8 14 23 55 16.96540 12.13461 289 0
favstats(wage~sex,data=CPS2)
## sex min Q1 median Q3 max mean sd n missing
## 1 F 1.75 4.7175 6.735 9.8125 24.98 7.728770 4.102386 244 0
## 2 M 1.00 6.0000 8.930 13.0000 26.29 9.994913 5.285854 289 0
This is the density plot made to show the relationship between sex and work experience. This shows that on average females have more work experience then males especially between the 20 to 60 year marks. In females it is almost double the amount of males, this is easiest to see in that 20 to 60 years of experience range.
densityplot(~exper|sex,data=CPS2,
main="Relationship between Work Experience and Sex",
xlab="Sex",
layout=c(1,2))
This is an xy plot made to show the relationship between wage, experience and sex. In this graph it shows that males are making more money on average then females regardless of how much experience they have. There is a higher concentration of males making more money with less experience then females.
xyplot(wage~exper|sex,data=CPS2,
main="Wage and Experience Relationship in Sex",
xlab="Experience",
ylab="Wage",
groups=sex)
Table for Marriage Status Men v. Women:
WageAMar <- xtabs(~sex+married,data=CPS2)
WageAMar2 <- rowPerc(xtabs(~sex+married,data=CPS2))
kable(WageAMar)
| Married | Single | |
|---|---|---|
| F | 162 | 82 |
| M | 188 | 101 |
kable(WageAMar2)
| Married | Single | Total | |
|---|---|---|---|
| F | 66.39 | 33.61 | 100 |
| M | 65.05 | 34.95 | 100 |
Bar charts for Marriage Status of Men v. Women:
barchartGC(~sex+married,data=CPS2,main="Marriage Status")
barchartGC(~sex+married,data=CPS2,main="Marriage Status",type="percent")
Both the table and the bar chart show about an equal percentage of men and women in the study are married and single.
Data Summary for Wage & Marriage Status:
favstats(wage~married,data=CPS2)
## married min Q1 median Q3 max mean sd n missing
## 1 Married 1.00 5.620 8.595 12 26.29 9.398486 4.925121 350 0
## 2 Single 2.01 4.585 6.500 10 25.00 8.114098 4.776275 183 0
Now including sex:
favstats(wage~sex+married,data=CPS2)
## sex.married min Q1 median Q3 max mean sd n
## 1 F.Married 1.75 4.8750 6.880 10.000 23.25 7.683765 3.725468 162
## 2 M.Married 1.00 6.8250 9.845 13.545 26.29 10.876064 5.345956 188
## 3 F.Single 3.35 4.5125 6.450 9.245 24.98 7.817683 4.784327 82
## 4 M.Single 2.01 5.0000 6.670 10.670 25.00 8.354752 4.779962 101
## missing
## 1 0
## 2 0
## 3 0
## 4 0
While the first code chunk shows there is a difference in wages between married and single people, the second (which breaks it further down by sex) shows an even further difference in wages. According to the data, men make higher wages than women, regardless of their marriage status.
Mean Wages of Single v. Married Men and Women:
with(CPS2,tapply(wage,INDEX=list(married,sex),FUN=mean))
## F M
## Married 7.683765 10.876064
## Single 7.817683 8.354752
Men’s average wage rises once they are married, but women’s average wage decreases once they are married. This could be due to men being seen as the providers in society’s view of marriage so they have to make more while the wife is expected to make less because she is not the “provider”.
Box & Whiskers Plot:
bwplot(wage ~ sex | married, data=CPS2,
main="Wages vs. Sex, by Marital Status", ylab="Wages (dollars per hour)",
xlab = "Sex")
The box plot shows that when single, men and women had very similar median wages and different ranges (men having a higher IQR and range between the min and max values). However when their status changed to married, men’s median wage jumped significantly higher than female’s and their range expanded significantly as well.
In conclusion, males from this study are making higher wages than females, no matter the added variable. Although women do tend to choose different work sectors than men, when comparing wages in the same sector women receive lower wages than their male counterparts. Females on average have more work experience, but regardless of one’s work experience, females are still making lower wages then males. Based on these results, we can definitively say that even though females may have the same level of education as males, they would receive a lower hourly wage. Marriage status does affect both males and females’ average wages, as men begin to make higher wages on average and women begin to make less once they marry.
It is hard to say what exactly are the reasons behind the wage difference between men and women. Although it is most likely not the only reason behind it, the values of society regarding sex is certainly a possibility. In general, women are placed below men in social status and “worth” while men are seen as more “valuable”. They are seen as the harder workers, the “providers” for the family, the stronger of the two sexes. Men are seen to do more work, or more important work, so in response they are given higher wages, even if women are performing the same job or have the same education or work experience.
Our group never looked into some of the other variables from the original data set CPS85, such as age, race, region, and union membership. It would be interesting to add these variables to research if further study was applied. It would also be interesting to throw other variables into the data set, such the presence of a disability, to see if there is a difference in wages or to even compare/contrast the CPS85 data set with a similar one taken from another country to check for differences.