Case Study: Gender Discrimination

Attach Packages

For this lab you’ll need to make sure that mosaic and openintro are attached:

library(mosaic)
library(openintro)

The Discrimination Data

Here is the gender discrimination data:

help("gender_discrimination")

Summary Statistics

Let’s make a contingency table of the decision based on the gender shown on the resume:

tally(decision ~ gender, data = gender_discrimination)

##           gender
## decision   female male
##   not          10    3
##   promoted     14   21

We can get the percentages, too, if we like:

tally(decision ~ gender, data = gender_discrimination,
      format = "percent", margins = TRUE)

##           gender
## decision      female      male
##   not       41.66667  12.50000
##   promoted  58.33333  87.50000
##   Total    100.00000 100.00000

Here’s how to quickly compute the difference in the percentage of women recommended for promotion, vs. men:

diffprop(decision ~ gender, data = gender_discrimination)

##   diffprop 
## -0.2916667

Why is the result a negative number?

YOUR ANSWER

Simulating Once

We can perform the simulation suggested in the video by “shuffling” the gender variable. The following code creates a data frame where the resumes have been randomly reassigned to the 48 managers:

simulated <- data.frame(
  gender = shuffle(gender_discrimination$gender),
  decision = gender_discrimination$decision
)

Take a look at it:

View(simulated)

Make a contingency table of the simulation:

tally(decision ~ gender, data = simulated)

##           gender
## decision   female male
##   not           6    7
##   promoted     18   17

See the percentages:

tally(decision ~ gender, data = simulated,
      format = "percent", margins = TRUE)

##           gender
## decision      female      male
##   not       25.00000  29.16667
##   promoted  75.00000  70.83333
##   Total    100.00000 100.00000

Compute the difference in percentages:

diffprop(decision ~ gender, data = simulated)

##   diffprop 
## 0.04166667

Actually, you can do it all on one go. The following code randomly assigns the resumes to the managers and computes the differences in proportions, all in one line:

diffprop(decision ~ shuffle(gender), data = gender_discrimination)

##   diffprop 
## -0.2083333

Try the above code ten times. How many times did you get a difference of -29% or less?

YOUR ANSWER

Simulate Many Times at Once

In the video the simulation was repeated 100 times. we can repeat lots of times, top. If you go further in statistics you will learn how to write programs to do simulations. The following small program repeats the simulations 999 times and makes a tally of the results. (Before running it, change the number in the set.seed() command to a four-digit number of your choosing.)

set.seed(3030) ## you should replace 3030 with your own number!
observed <- diffprop(decision ~ gender, data = gender_discrimination)
nullDist <- do(999) * diffprop(decision ~ shuffle(gender), data = gender_discrimination)
statTally(observed, nullDist, binwidth = 0.01)

## Null distribution appears to be asymmetric. (p = 7.99e-33)

## 
## Test statistic applied to sample data =  -0.2917

## 
## Quantiles of test statistic applied to random data:

##         50%         90%         95%         99% 
## -0.04166667  0.12500000  0.20833333  0.29166667

##

## 
## Of the 1000 samples (1 original + 999 random),

## 
##  15 ( 1.5 % ) had test stats = -0.2917

## 
##  18 ( 1.8 % ) had test stats <= -0.2917

What percentage of the time did the simulations show a difference of -29% or less?

YOUR ANSWER

If there is no discrimination against women, then about what is the chance of seeing a difference of -29% or less?

YOUR ANSWER

What do you think? Did the experiment provide sufficiently strong evidence to conclude that among the managers there was some discrimination against women?

YOUR ANSWER

Case Study: Gender Discrimination

Your Name Here!

2019-07-30

Attach Packages

The Discrimination Data

Summary Statistics

Simulating Once

Simulate Many Times at Once