Introduction

The original data set used for this project, CPS85, was taken in 1985 with the Current Population Survey (CPS), used to collect data in-between census years. The CPS85 data set contains information on wages (US dollars per hour) of both men and women and it includes variables such as years of education and work experience, race, sex, marriage status, age, sector of work, region of residence, and union membership. We used a subset of this data set, CPS2, that does not include an outlier from the original data set. For our project, our questions were:

Methods

Confounding Factor: Sector

Variable Analysis: Wage –> Numerical Sex –> Factor Sector –> Factor

To find and correct for the possible confounding factor work sector, I planned to use:

  • Numerical:
    • xtabs()
    • favstats()
    • lm()
  • Graphical:
    • Density plot

Confounding Factor: Education

Variable Analysis: Education –> Numerical

To find and correct for the possible confounding factor education, I planned to use:

  • Numerical:
    • xtabs()
    • predict()
    • chisqtestGC()
  • Graphical:
    • Histogram
    • Density plot

Confounding Factor: Experience

Variable Analysis: Experience –> Numerical

To find and correct for the possible confounding factor experience, I planned to use:

  • Numerical:
    • favstats()
  • Graphical:
    • Scatter plot
    • Density plot

Confounding Factor: Marriage Status

Variable Analysis: Marriage Status –> Factor

To find and correct for the possible confounding factor marriage status, I planned to use:

  • Numerical
    • xtabs()
    • favstats()
    • tapply()
  • Graphical
    • Bar chart
    • Box and Whisker Plot

Results

favstats(wage~sex,data=CPS2)            
##   sex  min     Q1 median      Q3   max     mean       sd   n missing
## 1   F 1.75 4.7175  6.735  9.8125 24.98 7.728770 4.102386 244       0
## 2   M 1.00 6.0000  8.930 13.0000 26.29 9.994913 5.285854 289       0

Confounding Factor: Sector

The amount of males and females in each sector, and the percents of males and females in each sector

sexSector<-xtabs(~sex+sector,data=CPS2)
sexSector
##    sector
## sex clerical const manag manuf other prof sales service
##   F       76     0    20    24     6   52    17      49
##   M       21    20    34    44    62   53    21      34
rowPerc(sexSector)
##    sector
## sex clerical  const  manag  manuf  other   prof  sales service  Total
##   F    31.15   0.00   8.20   9.84   2.46  21.31   6.97   20.08 100.00
##   M     7.27   6.92  11.76  15.22  21.45  18.34   7.27   11.76 100.00

You can use favstats to look at the mean wage for each sector

favstats(wage~sector,data=CPS2)
##     sector  min     Q1 median      Q3   max      mean       sd   n missing
## 1 clerical 3.00 5.2000  7.500  9.5000 15.03  7.422577 2.699018  97       0
## 2    const 3.75 7.2250  9.750 11.6275 15.00  9.502000 3.343877  20       0
## 3    manag 1.00 7.1250 10.620 15.8550 26.29 12.115185 6.244713  54       0
## 4    manuf 3.00 4.9250  6.750  9.8725 22.20  8.036029 4.117607  68       0
## 5    other 2.85 5.0000  6.940 10.8150 26.00  8.500588 4.601049  68       0
## 6     prof 4.35 7.5000 10.610 15.3800 24.98 11.947429 5.523833 105       0
## 7    sales 3.35 4.3125  5.725 10.8325 19.98  7.592632 4.232272  38       0
## 8  service 1.75 3.9650  5.500  8.0000 25.00  6.537470 3.673278  83       0

Here is a density plot showing wages for the different sector

densityplot(~wage|sector,data=CPS2,main="Wages, by Sector")

From the row percents and the density plots, it appears women do often choose different sectors than men, and we can see that some sectors have higher wages than others. The null hypothesis states this is why women have lower wages. In order to disprove the null hypothesis, we will have to compare the wages of men and women in the same sector.

Density plot comparing the wages of each sex for the different sectors:

densityplot(~wage|sector+sex,data=CPS2,main="Wages of Sectors, by Sex",xlab="Wages")

Looking at the density plot, you can clearly see that males make more than females is all sectors. The plots with females have most of the data clustered in one place. This means that most females make around the same wage. But for the males, the data is more spread out, and stretches farther- showing us that a lot of the males make a variety of wages, and usually more than the females.

modES<-lm(wage~sex+educ+sector,data=CPS2)
summary(modES)
## 
## Call:
## lm(formula = wage ~ sex + educ + sector, data = CPS2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5778  -2.7375  -0.6488   2.1090  16.2430 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.57528    1.23998   0.464    0.643    
## sexM           2.01743    0.41416   4.871 1.47e-06 ***
## educ           0.49548    0.09023   5.491 6.24e-08 ***
## sectorconst    1.38474    1.10722   1.251    0.212    
## sectormanag    3.03940    0.75208   4.041 6.11e-05 ***
## sectormanuf    0.61040    0.71542   0.853    0.394    
## sectorother    0.22762    0.74052   0.307    0.759    
## sectorprof     2.60553    0.65207   3.996 7.37e-05 ***
## sectorsales   -0.64304    0.82362  -0.781    0.435    
## sectorservice -0.61294    0.65187  -0.940    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.242 on 523 degrees of freedom
## Multiple R-squared:  0.2657, Adjusted R-squared:  0.2531 
## F-statistic: 21.03 on 9 and 523 DF,  p-value: < 2.2e-16

This estimates that, despite confounding factors, men will make about two dollars an hour more than women, give or take 40 cents for standard error. This value is 4.87 standard errors above the value the null hypothesis would expect. The probability of getting these values, if the null is correct, is about 1.47e-06, or about 0.00000147 percent. We can reject the null in favor of the alternative.

Confounding Factor: Education

We used xtabs to see if there was any correlation between gender and level of education.

xtabs(~sex+educ, data=CPS85)
##    educ
## sex   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##   F   0   0   0   0   2   1   9   5   5  11 110  16  23   6  34  14   9
##   M   1   1   1   1   1   4   6   7  12  16 109  21  33   7  37  10  22

We crossed “sex” and “level of education” and saw that more men are out in the workforce with between two and seven years of education.

CPSfemale <- subset(CPS2, sex == "F")
CPSmale <- subset(CPS2, sex == "M")
modEducFemale <- lmGC(wage ~ educ, data = CPSfemale)
modEducMale <- lmGC(wage ~ educ, data=CPSmale)

Next, we utilized the predict model to compare mens’ and womens’ wages.

 predict(modEducMale, x = 14)
## Predict wage is about 10.67,
## give or take 4.954 or so for chance variation.
predict(modEducFemale, x = 14)
## Predict wage is about 8.543,
## give or take 3.585 or so for chance variation.

We used the “modEducMale” and “modEducFemale” to predict mens’ and womens’ wages, which yielded some interesting results. The function showed that the predicted wage for men is always at least $2 higher than the womens’. However, the chance variation for mens’ wages was considerably higher than the womens’.

Then we used two histograms, one to graph the relationship between gender and education…

histogram(~educ | sex, data=CPS2)

The first histogram compares men and women based on their level of education, which shows that on average, women posses a higher level of education than men.

… and the other to graph wage vs. sex.

histogram(~wage | sex, data=CPS2)

The second histogram shows that more women make a wage of $5-$6 per hour compared to men. Also, a considerable number of men make more than $15 per hour, which can’t be said for the women.

Finally, we employed the use of a density plot to show the level of males and females, their level of education, and their wages.

densityplot(~wage | educ + sex, data=CPS2)

This density plot observes three different factors: sex, wage, and education. Our results showed us that more females have a higher level of education than males.

Confounding Factor: Experience

The first R chunk I made was to see the relationship between wage and experience. To see this I made an xy plot graph. Looking at the graph it shows that the more experience a person has, in this sample group, does not mean they will receive the highest wage. So wage and experience are not related.

xyplot(wage~exper,data=CPS2,
       xlab="Experience",
       ylab="Wage",
       main="Wage and Experience Relationship")

Next I made R chunks to find the favstats relationship between sex and experience, then sex and wage. On average females had almost two years more of experience then males, specifically 1.94 years.

In the sex and wage relationship I looked at the mean for males and females. This showed a 2.27 wage difference. Males have an overall higher wage then females. Even though females had, on average, more experience.

favstats(exper~sex,data=CPS2)
##   sex min Q1 median Q3 max     mean       sd   n missing
## 1   F   0  9     16 28  49 18.90574 12.58663 244       0
## 2   M   0  8     14 23  55 16.96540 12.13461 289       0
favstats(wage~sex,data=CPS2)
##   sex  min     Q1 median      Q3   max     mean       sd   n missing
## 1   F 1.75 4.7175  6.735  9.8125 24.98 7.728770 4.102386 244       0
## 2   M 1.00 6.0000  8.930 13.0000 26.29 9.994913 5.285854 289       0

This is the density plot made to show the relationship between sex and work experience. This shows that on average females have more work experience then males especially between the 20 to 60 year marks. In females it is almost double the amount of males, this is easiest to see in that 20 to 60 years of experience range.

densityplot(~exper|sex,data=CPS2,
            main="Relationship between Work Experience and Sex",
            xlab="Sex",
            layout=c(1,2))

This is an xy plot made to show the relationship between wage, experience and sex. In this graph it shows that males are making more money on average then females regardless of how much experience they have. There is a higher concentration of males making more money with less experience then females.

xyplot(wage~exper|sex,data=CPS2,
       main="Wage and Experience Relationship in Sex",
       xlab="Experience",
       ylab="Wage",
       groups=sex)

Confounding Factor: Marriage Status

Table for Marriage Status Men v. Women:

WageAMar <- xtabs(~sex+married,data=CPS2)
WageAMar2 <- rowPerc(xtabs(~sex+married,data=CPS2))
kable(WageAMar)
Married Single
F 162 82
M 188 101
kable(WageAMar2)
Married Single Total
F 66.39 33.61 100
M 65.05 34.95 100

Bar charts for Marriage Status of Men v. Women:

barchartGC(~sex+married,data=CPS2,main="Marriage Status")

barchartGC(~sex+married,data=CPS2,main="Marriage Status",type="percent")

Both the table and the bar chart show about an equal percentage of men and women in the study are married and single.

Data Summary for Wage & Marriage Status:

favstats(wage~married,data=CPS2)
##   married  min    Q1 median Q3   max     mean       sd   n missing
## 1 Married 1.00 5.620  8.595 12 26.29 9.398486 4.925121 350       0
## 2  Single 2.01 4.585  6.500 10 25.00 8.114098 4.776275 183       0

Now including sex:

favstats(wage~sex+married,data=CPS2)
##   sex.married  min     Q1 median     Q3   max      mean       sd   n
## 1   F.Married 1.75 4.8750  6.880 10.000 23.25  7.683765 3.725468 162
## 2   M.Married 1.00 6.8250  9.845 13.545 26.29 10.876064 5.345956 188
## 3    F.Single 3.35 4.5125  6.450  9.245 24.98  7.817683 4.784327  82
## 4    M.Single 2.01 5.0000  6.670 10.670 25.00  8.354752 4.779962 101
##   missing
## 1       0
## 2       0
## 3       0
## 4       0

While the first code chunk shows there is a difference in wages between married and single people, the second (which breaks it further down by sex) shows an even further difference in wages. According to the data, men make higher wages than women, regardless of their marriage status.

Mean Wages of Single v. Married Men and Women:

with(CPS2,tapply(wage,INDEX=list(married,sex),FUN=mean))
##                F         M
## Married 7.683765 10.876064
## Single  7.817683  8.354752

Men’s average wage rises once they are married, but women’s average wage decreases once they are married. This could be due to men being seen as the providers in society’s view of marriage so they have to make more while the wife is expected to make less because she is not the “provider”.

Box & Whiskers Plot:

bwplot(wage ~ sex | married, data=CPS2,
       main="Wages vs. Sex, by Marital Status", ylab="Wages (dollars per hour)",
       xlab = "Sex")

The box plot shows that when single, men and women had very similar median wages and different ranges (men having a higher IQR and range between the min and max values). However when their status changed to married, men’s median wage jumped significantly higher than female’s and their range expanded significantly as well.

Conclusion

In conclusion, males are making more wage than females, in this study sample. Females on average have more work experience, but regardless of one’s work experience, females are still making less wage then males. Marriage status does affect both males and females’ average wages, as men begin to make higher wages on average and women begin to make less once they marry. Based on these results, we can definitively say that even though females may have the same level of education as males, they would receive a lower hourly wage. Although women do tend to choose