Attach Packages
For this lab you’ll need to make sure that mosaic and openintro are attached:
Summary Statistics
Let’s make a contingency table of the decision based on the gender shown on the resume:
## gender
## decision female male
## not 10 3
## promoted 14 21
We can get the percentages, too, if we like:
## gender
## decision female male
## not 41.66667 12.50000
## promoted 58.33333 87.50000
## Total 100.00000 100.00000
Here’s how to quickly compute the difference in the percentage of women recommended for promotion, vs. men:
## diffprop
## -0.2916667
Why is the result a negative number?
YOUR ANSWER
Simulating Once
We can perform the simulation suggested in the video by “shuffling” the gender variable. The following code creates a data frame where the resumes have been randomly reassigned to the 48 managers:
simulated <- data.frame(
gender = shuffle(gender_discrimination$gender),
decision = gender_discrimination$decision
)Take a look at it:
Make a contingency table of the simulation:
## gender
## decision female male
## not 6 7
## promoted 18 17
See the percentages:
## gender
## decision female male
## not 25.00000 29.16667
## promoted 75.00000 70.83333
## Total 100.00000 100.00000
Compute the difference in percentages:
## diffprop
## 0.04166667
Actually, you can do it all on one go. The following code randomly assigns the resumes to the managers and computes the differences in proportions, all in one line:
## diffprop
## -0.2083333
Try the above code ten times. How many times did you get a difference of -29% or less?
YOUR ANSWER
Simulate Many Times at Once
In the video the simulation was repeated 100 times. we can repeat lots of times, top. If you go further in statistics you will learn how to write programs to do simulations. The following small program repeats the simulations 999 times and makes a tally of the results. (Before running it, change the number in the set.seed() command to a four-digit number of your choosing.)
set.seed(3030) ## you should replace 3030 with your own number!
observed <- diffprop(decision ~ gender, data = gender_discrimination)
nullDist <- do(999) * diffprop(decision ~ shuffle(gender), data = gender_discrimination)
statTally(observed, nullDist, binwidth = 0.01)## Null distribution appears to be asymmetric. (p = 7.99e-33)
##
## Test statistic applied to sample data = -0.2917
##
## Quantiles of test statistic applied to random data:
## 50% 90% 95% 99%
## -0.04166667 0.12500000 0.20833333 0.29166667
##
##
## Of the 1000 samples (1 original + 999 random),
##
## 15 ( 1.5 % ) had test stats = -0.2917
##
## 18 ( 1.8 % ) had test stats <= -0.2917
What percentage of the time did the simulations show a difference of -29% or less?
YOUR ANSWER
If there is no discrimination against women, then about what is the chance of seeing a difference of -29% or less?
YOUR ANSWER
What do you think? Did the experiment provide sufficiently strong evidence to conclude that among the managers there was some discrimination against women?
YOUR ANSWER