library(tidyverse)
library(knitr)
library(kableExtra)
library(tidymodels)
setwd("C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets")
<- read_csv("sex_discrimination.csv")
sex_disc set.seed(9753) # allows to replicate results each time document is rendered
Chapter 11-12 - Statistical Inference
Load library, set the working directory, and dataset
Two-Way table
<- table(sex_disc)
decision_table kbl(decision_table, caption = "Two-Way Table") %>% kable_styling() %>% row_spec(row = 0, color = "dodgerblue")
not promoted | promoted | |
---|---|---|
female | 10 | 14 |
male | 3 | 21 |
kbl(round(proportions(decision_table, "sex"),2)) %>% # marginal proportions by sex for promotion decision
kable_styling()%>%
row_spec(row = 0, color = "dodgerblue")
not promoted | promoted | |
---|---|---|
female | 0.42 | 0.58 |
male | 0.12 | 0.88 |
We can see that 10/24 (41.7%) women were not promoted and 3/24 (12.5%) men were not promoted. The question is, does this appear to be clear bias, or are the differences in proportions due to random chance?
Random permutations
One random permutation
Using the mutate() and sample() functions, the vector of promotion decision is mixed up, or permuted, such that whether someone is male or female can’t possibly be causing any difference in proportions. However, due to inherent natural variability, there is also no expectation that the promotion decisions are exactly the same for any sample. We use the sample() function to create the shuffled promotion decisions, and then save that shuffled dataset into a new variable named decision_perm, using the mutate() function.
<- sex_disc %>%
perm1 mutate(decision_perm = sample(decision)) %>%
group_by(sex) %>%
summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
prop_promoted = mean(decision == "promoted")) %>%
summarize(diff_perm = diff(prop_promoted_perm),
diff_orig = diff(prop_promoted)) # promoted - not promoted
kbl(perm1)%>%
kable_styling() %>%
row_spec(row = 0, color = "dodgerblue")
diff_perm | diff_orig |
---|---|
-0.2083333 | 0.2916667 |
We can see that the original (observed) difference in proportions was 41/7%-12.5%=29.2%
The permutated sample difference in proportions was different. Notice it will change each time you run this chunk due to random chance.
Many random permutations
By repeating the permutation and difference calculations five times, the permuted differences are seen to be sometimes positive, sometimes negative, sometimes close to zero, sometimes far from zero. However, five times isn’t quite enough to capture all of the variability in the null differences.
The rep_sample_n() function performs repeated sampling of a dataset, where the size of the sample is specified in the first argument and the number of repetitions is specified in the reps argument. We notice that the sample size is the same size as the sex discrimination dataset, since it equals the number of rows in the original dataset. We are also specifying that the sampling of the datset should be done without replacement (replace = FALSE), since we want each row to only be selected once. You can think of this as creating five copies of the original sex discrimination dataset.
<- sex_disc %>%
perm_reps rep_sample_n(size = nrow(sex_disc), reps = 10, replace = FALSE) %>%
mutate(decision_perm = sample(decision)) %>%
group_by(replicate, sex) %>%
summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
prop_promoted = mean(decision == "promoted")) %>%
summarize(diff_perm = diff(prop_promoted_perm),
diff_orig = diff(prop_promoted)) # promoted - not promoted
`summarise()` has grouped output by 'replicate'. You can override using the
`.groups` argument.
kbl(perm_reps) %>%
kable_styling() %>%
row_spec(row = 0, color = "dodgerblue")
replicate | diff_perm | diff_orig |
---|---|---|
1 | 0.0416667 | 0.2916667 |
2 | -0.1250000 | 0.2916667 |
3 | -0.2083333 | 0.2916667 |
4 | 0.0416667 | 0.2916667 |
5 | -0.1250000 | 0.2916667 |
6 | -0.2083333 | 0.2916667 |
7 | -0.1250000 | 0.2916667 |
8 | 0.2916667 | 0.2916667 |
9 | -0.1250000 | 0.2916667 |
10 | -0.0416667 | 0.2916667 |
Notice the variation in the permutation differences.
Repeat the permutation 100 times and plot all differences
<- sex_disc %>%
perm_100_reps rep_sample_n(size = nrow(sex_disc), reps = 100, replace = FALSE) %>%
mutate(decision_perm = sample(decision)) %>%
group_by(replicate, sex) %>%
summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
prop_promoted = mean(decision == "promoted")) %>%
summarize(diff_perm = diff(prop_promoted_perm),
diff_orig = diff(prop_promoted)) # promoted - not promoted
`summarise()` has grouped output by 'replicate'. You can override using the
`.groups` argument.
<- ggplot(perm_100_reps, aes(diff_perm)) +
plot_perm geom_dotplot(binwidth = 0.01) +
geom_vline(xintercept = c(-0.292, 0.292), color = "blue", linetype = "dotted", linewidth=1) +
labs(title = "Dotplot of 100 Replications of Permutations of Differences of Proportions")
plot_perm
Remember that each dot represents a different permutation of the differences in proportions of promotions between males and females.
Although the plot will appear differently each time due to randomization, notice that very few points are > 0.292 or < -0.292. Visually, this shows that it is less likely that the difference of 29.2% is due to random chance, and more likely that discrimination of promotions was occurring based on sex.
Use the infer framework
Randomized data under null model of independence
step through specifying the null model and then performing 100 permutations to evaluate whether decision status differs between the “female” and “male” groups
specify() that the relationship of interest is decision vs. sex and a success in this context is promotion, set success to “promoted.
# Hypothesize independence
<- sex_disc %>%
decide_perm specify(decision ~ sex, success = "promoted") %>%
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "diff in props", order = c("male", "female"))
decide_perm
Response: decision (factor)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 100 × 2
replicate stat
<int> <dbl>
1 1 0.0417
2 2 0.0417
3 3 -0.125
4 4 -0.208
5 5 -0.208
6 6 0.0417
7 7 0.208
8 8 0.125
9 9 0.208
10 10 0.0417
# ℹ 90 more rows
Calculate a p-value for the above hypothesis test
<- perm_100_reps %>%
p_value summarize(count = sum(diff_orig <= diff_perm),
proportion = mean(diff_orig <= diff_perm))
p_value
# A tibble: 1 × 2
count proportion
<int> <dbl>
1 0 0
Conclusion based on p-value
This p-value suggests that such a difference from chance alone, assuming the null hypothesis was true, would be rare: it would only happen about 2 (or other value depending on each replication) in 100 times. When results like these are inconsistent with Ho, we reject Ho in favor of Ha. Here, we concluded there was discrimination against female candidates. The 2-in-100 chance is what we call a p-value, which is a probability quantifying the strength of the evidence against the null hypothesis, given the observed data.
Confidence Intervals
Bootstrap percentile interval The main idea in the previous exercise was that the distance between the original sample p-hat.
And the resampled (or bootstrapped) p-hat values gives a measure for how far the original p-hat is from the true population proportion.
The same variability can be measured through a different mechanism. As before, if p-hat is sufficiently close to the true parameter, then the resampled (or bootstrapped) p-hat values will vary in such a way that they overlap with the true parameter.
Create bootstrap distribution of proportion of females promoted
# Compute p-hat* for each resampled proportion of females who were promoted
<- sex_disc %>%
fprops filter(sex == "female") %>%
# Specify vote as the response, where yes means success
specify(response = decision, success = "promoted") %>%
# Generate 1000 reps of type bootstrap
generate(reps = 1000, type = "bootstrap") %>%
# Calculate the summary stat "prop"
calculate(stat = "prop")
fprops
Response: decision (factor)
# A tibble: 1,000 × 2
replicate stat
<int> <dbl>
1 1 0.708
2 2 0.708
3 3 0.583
4 4 0.583
5 5 0.708
6 6 0.5
7 7 0.458
8 8 0.542
9 9 0.625
10 10 0.542
# ℹ 990 more rows
Find an interval of values that are plausible for the true parameter by calculating p-hat±2SE
The lower bound of the confidence interval is p_hat minus twice the standard error of stat. Use sd() to calculate the standard error. The upper bound is p_hat plus twice the standard error of stat.
# Manually calculate a 95% percentile interval
%>%
fprops summarize(
lower = quantile(stat, p = 0.025),
upper = quantile(stat, p = 0.975)
)
# A tibble: 1 × 2
lower upper
<dbl> <dbl>
1 0.375 0.751
# Calculate the same interval, more conveniently
<- fprops %>%
percentile_ci get_confidence_interval(level = 0.95)
percentile_ci
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 0.375 0.751
Conclusion
We are 95% confident that the true proportion of women who are promoted is between 37.5% and 75.1%.
Note - this is a wide range of values, so not very precise