Chapter 11-12 - Statistical Inference

Author

Rachel Saidi

Load library, set the working directory, and dataset

library(tidyverse)
library(knitr)
library(kableExtra)
library(tidymodels)
setwd("C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets")
sex_disc <- read_csv("sex_discrimination.csv")
set.seed(9753)  # allows to replicate results each time document is rendered

Two-Way table

decision_table  <- table(sex_disc)
kbl(decision_table, caption = "Two-Way Table") %>% kable_styling() %>% row_spec(row = 0, color = "dodgerblue")
Two-Way Table
not promoted promoted
female 10 14
male 3 21
kbl(round(proportions(decision_table, "sex"),2)) %>% # marginal proportions by sex for promotion decision 
kable_styling()%>%
    row_spec(row = 0, color = "dodgerblue")
not promoted promoted
female 0.42 0.58
male 0.12 0.88

We can see that 10/24 (41.7%) women were not promoted and 3/24 (12.5%) men were not promoted. The question is, does this appear to be clear bias, or are the differences in proportions due to random chance?

Random permutations

One random permutation

Using the mutate() and sample() functions, the vector of promotion decision is mixed up, or permuted, such that whether someone is male or female can’t possibly be causing any difference in proportions. However, due to inherent natural variability, there is also no expectation that the promotion decisions are exactly the same for any sample. We use the sample() function to create the shuffled promotion decisions, and then save that shuffled dataset into a new variable named decision_perm, using the mutate() function.

perm1 <- sex_disc %>%
  mutate(decision_perm = sample(decision)) %>%
  group_by(sex) %>%
  summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
            prop_promoted = mean(decision == "promoted")) %>%
  summarize(diff_perm = diff(prop_promoted_perm), 
            diff_orig = diff(prop_promoted))  # promoted - not promoted
kbl(perm1)%>%
  kable_styling() %>%
    row_spec(row = 0, color = "dodgerblue")
diff_perm diff_orig
-0.2083333 0.2916667

We can see that the original (observed) difference in proportions was 41/7%-12.5%=29.2%

The permutated sample difference in proportions was different. Notice it will change each time you run this chunk due to random chance.

Many random permutations

By repeating the permutation and difference calculations five times, the permuted differences are seen to be sometimes positive, sometimes negative, sometimes close to zero, sometimes far from zero. However, five times isn’t quite enough to capture all of the variability in the null differences.

The rep_sample_n() function performs repeated sampling of a dataset, where the size of the sample is specified in the first argument and the number of repetitions is specified in the reps argument. We notice that the sample size is the same size as the sex discrimination dataset, since it equals the number of rows in the original dataset. We are also specifying that the sampling of the datset should be done without replacement (replace = FALSE), since we want each row to only be selected once. You can think of this as creating five copies of the original sex discrimination dataset.

perm_reps <- sex_disc %>%
  rep_sample_n(size = nrow(sex_disc), reps = 10, replace = FALSE) %>%
  mutate(decision_perm = sample(decision)) %>%
  group_by(replicate, sex) %>%
  summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
            prop_promoted = mean(decision == "promoted")) %>%
  summarize(diff_perm = diff(prop_promoted_perm), 
            diff_orig = diff(prop_promoted))  # promoted - not promoted
`summarise()` has grouped output by 'replicate'. You can override using the
`.groups` argument.
kbl(perm_reps) %>%
  kable_styling() %>%
    row_spec(row = 0, color = "dodgerblue")
replicate diff_perm diff_orig
1 0.0416667 0.2916667
2 -0.1250000 0.2916667
3 -0.2083333 0.2916667
4 0.0416667 0.2916667
5 -0.1250000 0.2916667
6 -0.2083333 0.2916667
7 -0.1250000 0.2916667
8 0.2916667 0.2916667
9 -0.1250000 0.2916667
10 -0.0416667 0.2916667

Notice the variation in the permutation differences.

Repeat the permutation 100 times and plot all differences

perm_100_reps <- sex_disc %>%
  rep_sample_n(size = nrow(sex_disc), reps = 100, replace = FALSE) %>%
  mutate(decision_perm = sample(decision)) %>%
  group_by(replicate, sex) %>%
  summarize(prop_promoted_perm = mean(decision_perm == "promoted"),
            prop_promoted = mean(decision == "promoted")) %>%
  summarize(diff_perm = diff(prop_promoted_perm), 
            diff_orig = diff(prop_promoted))  # promoted - not promoted
`summarise()` has grouped output by 'replicate'. You can override using the
`.groups` argument.
plot_perm <- ggplot(perm_100_reps, aes(diff_perm)) +
  geom_dotplot(binwidth = 0.01) +
  geom_vline(xintercept = c(-0.292, 0.292), color = "blue", linetype = "dotted", linewidth=1) +
  labs(title = "Dotplot of 100 Replications of Permutations of Differences of Proportions")
plot_perm

Remember that each dot represents a different permutation of the differences in proportions of promotions between males and females.

Although the plot will appear differently each time due to randomization, notice that very few points are > 0.292 or < -0.292. Visually, this shows that it is less likely that the difference of 29.2% is due to random chance, and more likely that discrimination of promotions was occurring based on sex.

Use the infer framework

Randomized data under null model of independence

  • step through specifying the null model and then performing 100 permutations to evaluate whether decision status differs between the “female” and “male” groups

  • specify() that the relationship of interest is decision vs. sex and a success in this context is promotion, set success to “promoted.

# Hypothesize independence
decide_perm <- sex_disc %>%
  specify(decision ~ sex, success = "promoted") %>%
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>% 
  calculate(stat = "diff in props", order = c("male", "female"))
decide_perm
Response: decision (factor)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 100 × 2
   replicate    stat
       <int>   <dbl>
 1         1  0.0417
 2         2  0.0417
 3         3 -0.125 
 4         4 -0.208 
 5         5 -0.208 
 6         6  0.0417
 7         7  0.208 
 8         8  0.125 
 9         9  0.208 
10        10  0.0417
# ℹ 90 more rows

Calculate a p-value for the above hypothesis test

p_value <- perm_100_reps %>%
  summarize(count = sum(diff_orig <= diff_perm), 
            proportion = mean(diff_orig <= diff_perm))
p_value
# A tibble: 1 × 2
  count proportion
  <int>      <dbl>
1     0          0

Conclusion based on p-value

This p-value suggests that such a difference from chance alone, assuming the null hypothesis was true, would be rare: it would only happen about 2 (or other value depending on each replication) in 100 times. When results like these are inconsistent with Ho, we reject Ho in favor of Ha. Here, we concluded there was discrimination against female candidates. The 2-in-100 chance is what we call a p-value, which is a probability quantifying the strength of the evidence against the null hypothesis, given the observed data.

Confidence Intervals

Bootstrap percentile interval The main idea in the previous exercise was that the distance between the original sample p-hat.

And the resampled (or bootstrapped) p-hat values gives a measure for how far the original p-hat is from the true population proportion.

The same variability can be measured through a different mechanism. As before, if p-hat is sufficiently close to the true parameter, then the resampled (or bootstrapped) p-hat values will vary in such a way that they overlap with the true parameter.

Create bootstrap distribution of proportion of females promoted

# Compute p-hat* for each resampled proportion of females who were promoted
fprops <- sex_disc %>%
  filter(sex == "female") %>%
  # Specify vote as the response, where yes means success
  specify(response = decision, success = "promoted") %>%
  # Generate 1000 reps of type bootstrap
  generate(reps = 1000, type = "bootstrap") %>% 
  # Calculate the summary stat "prop"
  calculate(stat = "prop")
fprops
Response: decision (factor)
# A tibble: 1,000 × 2
   replicate  stat
       <int> <dbl>
 1         1 0.708
 2         2 0.708
 3         3 0.583
 4         4 0.583
 5         5 0.708
 6         6 0.5  
 7         7 0.458
 8         8 0.542
 9         9 0.625
10        10 0.542
# ℹ 990 more rows

Find an interval of values that are plausible for the true parameter by calculating p-hat±2SE

The lower bound of the confidence interval is p_hat minus twice the standard error of stat. Use sd() to calculate the standard error. The upper bound is p_hat plus twice the standard error of stat.

# Manually calculate a 95% percentile interval
fprops %>%
  summarize(
    lower = quantile(stat, p = 0.025),
    upper = quantile(stat, p = 0.975)
  )
# A tibble: 1 × 2
  lower upper
  <dbl> <dbl>
1 0.375 0.751
# Calculate the same interval, more conveniently
percentile_ci <- fprops %>% 
  get_confidence_interval(level = 0.95)
percentile_ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    0.375    0.751

Conclusion

We are 95% confident that the true proportion of women who are promoted is between 37.5% and 75.1%.

Note - this is a wide range of values, so not very precise