7/29/2015
The original data set used for this project, CPS85, was taken in 1985 with the Current Population Survey (CPS), used to collect data in-between census years. The CPS85 data set contains information on wages (US dollars per hour) of both men and women and it includes variables such as years of education and work experience, race, sex, marriage status, age, sector of work, region of residence, and union membership. We used a subset of this data set, CPS2, that does not include an outlier from the original data set.
We have four questions:
Do women, on average, make less money than men because they choose jobs in lower-paying sectors?
Do women, on average, make less money than men because they have a lower level of education?
Do women, on average, make less money than men because they have less experience?
Do the women make less money, on average, than men because of their marriage status?
Variable Analysis
Sex: Factor Variable
Education: Numerical Variable
Wage: Numerical Variable
Methods to correct for confounding factor:
We used xtabs to see if there was any correlation between gender and level of education.
xtabs(~sex+educ, data=CPS85)
## educ ## sex 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ## F 0 0 0 0 2 1 9 5 5 11 110 16 23 6 34 14 9 ## M 1 1 1 1 1 4 6 7 12 16 109 21 33 7 37 10 22
We crossed "sex" and "level of education" and saw that more men are out in the workforce with between two and seven years of education.
Next, we utilized the predict model to compare mens' and womens' wages.
predict(modEducMale, x = 14)
## Predict wage is about 10.67, ## give or take 4.954 or so for chance variation.
predict(modEducFemale, x = 14)
## Predict wage is about 8.543, ## give or take 3.585 or so for chance variation.
We used the "modEducMale" and "modEducFemale" to predict mens' and womens' wages, which yielded some interesting results. The function showed that the predicted wage for men is always at least $2 higher than the womens'. However, the chance variation for mens' wages was considerably higher than the womens'.
Then we used two histograms, one to graph the relationship between gender and education…
… and the other to graph wage vs. sex.
histogram(~wage | sex, data=CPS2)
The second histogram shows that more women make a wage of $5-$6 per hour compared to men. Also, a considerable number of men make more than $15 per hour, which can't be said for the women.
Finally, we employed the use of a density plot to show the level of males and females, their level of education, and their wages.
densityplot(~wage | educ + sex, data=CPS2)
This density plot observes three different factors: sex, wage, and education. Our results showed us that more females have a higher level of education than males.
Variable Analysis
Sex: Factor Variable
Experience: Numerical Variable
Wage: Numerical Variable
Methods to correct for confounding factor:
The first R chunk I made was to see the relationship between wage and experience. To see this I made an xy plot graph. Looking at the graph it shows that the more experience a person has, in this sample group, does not mean they will receive the highest wage. So wage and experience are not related.
Next I made R chunks to find the favstats relationship between sex and experience, then sex and wage. On average females had almost two years more of experience then males, specifically 1.94 years.
favstats(exper~sex,data=CPS2)
## sex min Q1 median Q3 max mean sd n missing ## 1 F 0 9 16 28 49 18.90574 12.58663 244 0 ## 2 M 0 8 14 23 55 16.96540 12.13461 289 0
In the sex and wage relationship I looked at the mean for males and females. This showed a 2.27 wage difference. Males have an overall higher wage then females. Even though females had, on average, more experience.
favstats(wage~sex,data=CPS2)
## sex min Q1 median Q3 max mean sd n missing ## 1 F 1.75 4.7175 6.735 9.8125 24.98 7.728770 4.102386 244 0 ## 2 M 1.00 6.0000 8.930 13.0000 26.29 9.994913 5.285854 289 0
This density plot shows that on average females have more work experience then males especially between the 20 to 60 year marks. In females it is almost double the amount of males, this is easiest to see in that 20 to 60 years of experience range.
This is an xy plot made to show the relationship between wage, experience and sex. In this graph it shows that males are making more money on average then females regardless of how much experience they have. There is a higher concentration of males making more money with less experience then females.
Variable Analysis
Sex: Factor Variable
Marriage Status: Factor Variable
Wage: Numerical Variable
Methods to correct for confounding factor:
| Married | Single | |
|---|---|---|
| F | 162 | 82 |
| M | 188 | 101 |
| Married | Single | Total | |
|---|---|---|---|
| F | 66.39 | 33.61 | 100 |
| M | 65.05 | 34.95 | 100 |
Bar charts for Marriage Status of Men v. Women:
Both the table and the bar chart show about an equal percentage of men and women in the study are married and single.
favstats(wage~married,data=CPS2)
## married min Q1 median Q3 max mean sd n missing ## 1 Married 1.00 5.620 8.595 12 26.29 9.398486 4.925121 350 0 ## 2 Single 2.01 4.585 6.500 10 25.00 8.114098 4.776275 183 0
favstats(wage~sex+married,data=CPS2)
## sex.married min Q1 median Q3 max mean sd n ## 1 F.Married 1.75 4.8750 6.880 10.000 23.25 7.683765 3.725468 162 ## 2 M.Married 1.00 6.8250 9.845 13.545 26.29 10.876064 5.345956 188 ## 3 F.Single 3.35 4.5125 6.450 9.245 24.98 7.817683 4.784327 82 ## 4 M.Single 2.01 5.0000 6.670 10.670 25.00 8.354752 4.779962 101 ## missing ## 1 0 ## 2 0 ## 3 0 ## 4 0
While the first code chunk shows there is a difference in wages between married and single people, the second (which breaks it further down by sex) shows an even further difference in wages. According to the data, men make higher wages than women, regardless of their marriage status.
with(CPS2,tapply(wage,INDEX=list(married,sex),FUN=mean))
## F M ## Married 7.683765 10.876064 ## Single 7.817683 8.354752
The box plot shows that when single, men and women had very similar median wages and different ranges (men having a higher IQR and range between the min and max values). However, when their status changed to married, men's median wage jumped significantly higher than female's and their range expanded significantly as well.
Variable Analysis
Sex: Factor Variable
Job Sector: Factor Variable
Wage: Numerical Variable
Methods to correct for confounding factor:
The amount of males and females in each sector, and the percents of males and females in each sector
sexSector<-xtabs(~sex+sector,data=CPS2) sexSector
## sector ## sex clerical const manag manuf other prof sales service ## F 76 0 20 24 6 52 17 49 ## M 21 20 34 44 62 53 21 34
rowPerc(sexSector)
## sector ## sex clerical const manag manuf other prof sales service Total ## F 31.15 0.00 8.20 9.84 2.46 21.31 6.97 20.08 100.00 ## M 7.27 6.92 11.76 15.22 21.45 18.34 7.27 11.76 100.00
You can use favstats to look at the mean wage for each sector
favstats(wage~sector,data=CPS2)
## sector min Q1 median Q3 max mean sd n missing ## 1 clerical 3.00 5.2000 7.500 9.5000 15.03 7.422577 2.699018 97 0 ## 2 const 3.75 7.2250 9.750 11.6275 15.00 9.502000 3.343877 20 0 ## 3 manag 1.00 7.1250 10.620 15.8550 26.29 12.115185 6.244713 54 0 ## 4 manuf 3.00 4.9250 6.750 9.8725 22.20 8.036029 4.117607 68 0 ## 5 other 2.85 5.0000 6.940 10.8150 26.00 8.500588 4.601049 68 0 ## 6 prof 4.35 7.5000 10.610 15.3800 24.98 11.947429 5.523833 105 0 ## 7 sales 3.35 4.3125 5.725 10.8325 19.98 7.592632 4.232272 38 0 ## 8 service 1.75 3.9650 5.500 8.0000 25.00 6.537470 3.673278 83 0
Here is a density plot showing wages for the different sector
From the row percents and the density plots, it appears women do often choose different sectors than men, and we can see that some sectors have higher wages than others. The null hypothesis states this is why women have lower wages. In order to disprove the null hypothesis, we will have to compare the wages of men and women in the same sector.
Density plot comparing the wages of each sex for the different sectors:
Looking at the density plot, you can clearly see that males make more than females is all sectors. The plots with females have most of the data clustered in one place. This means that most females make around the same wage. But for the males, the data is more spread out, and stretches farther- showing us that a lot of the males make a variety of wages, and usually more than the females.
In conclusion…
Possible reasons why?
It might be interesting to study the other variables from the CPS85 data set or to compare/contrast with a similar study taken from another country.