Suppose that you wanted to investigate whether there is a gender difference in musical preference among Plymouth State University students. To investigate this question, you took a small sample of PSU students. The sample data is stored in music.csv.

Import music.csv from Moodle under Date Files, and test the hypothesis as described as below.

# Load packages
library(dplyr)
library(ggplot2)
library(infer)

# Import data
music <- read.csv("/resources/rstudio/Business Statistics/data/music.csv")
head(music)
##   singer    sex
## 1 Grande female
## 2 Grande female
## 3 Grande female
## 4 Grande female
## 5 Grande female
## 6 Grande female

Q1 How many male students in the sample reported to prefer Imagine Dragons?

15 out of 18 males prefer dragons

music %>%
  # Count the rows by singer and sex
  count(sex, singer)
## # A tibble: 2 x 3
##   sex    singer      n
##   <fct>  <fct>   <int>
## 1 female Grande     14
## 2 male   Dragons    15

Interpretation

Q2 What percentage of male students in the sample reported to prefer Imagine Dragons?

83.3 percent of male students prefer dragons.

# Find proportion of each sex who were Dragons
music %>%
  # Group by sex
  group_by(sex) %>%
  # Calculate proportion Dragons summary stat
  summarise(Dragons_prop = mean(singer == "Dragons"))
## # A tibble: 2 x 2
##   sex    Dragons_prop
##   <fct>         <dbl>
## 1 female            0
## 2 male              1

Interpretation

Q3 What is the observed difference in the proportions between male and female (male - female) students in the sample?

Most male students prefer dragons over grande where at female students prefer grande

Q4 What does this mean?

This means that the majority of males prefer dragons and the majority of females prefer grande

Q5 There might be a few negative permuted differences. What would a negative difference mean?

This would mean that the sex of the student goes against the norm and prefers the other preformer

# Calculate the observed difference in promotion rate
diff_orig <- music %>%
  # Group by sex
  group_by(sex) %>%
  # Summarize to calculate fraction Dragons
  summarise(prop_prom = mean(singer == "Dragons")) %>%
  # Summarize to calculate difference
  summarise(stat = diff(prop_prom)) %>% 
  pull()
    
# See the result
diff_orig # male - female
## [1] 1

# Create data frame of permuted differences in promotion rates
music_perm <- music %>%
  # Specify variables: singer (response variable) and sex (explanatory variable)
  specify(singer ~ sex, success = "Dragons") %>%
  # Set null hypothesis as independence: there is no gender musicrimination
  hypothesize(null = "independence") %>%
  # Shuffle the response variable, singer, one thousand times
  generate(reps = 1000, type = "permute") %>%
  # Calculate difference in proportion, male then female
  calculate(stat = "diff in props", order = c("male", "female")) # male - female
  
music_perm
## # A tibble: 1,000 x 2
##    replicate    stat
##        <int>   <dbl>
##  1         1  0.0333
##  2         2  0.310 
##  3         3 -0.243 
##  4         4  0.0333
##  5         5  0.0333
##  6         6  0.310 
##  7         7 -0.105 
##  8         8 -0.105 
##  9         9  0.0333
## 10        10  0.0333
## # ... with 990 more rows

# Using permutation data, plot stat
ggplot(music_perm, aes(x = stat)) + 
  # Add a histogram layer
  geom_histogram(binwidth = 0.01) +
  # Using original data, add a vertical line at stat
  geom_vline(aes(xintercept = diff_orig), color = "red")

Interpretation (no need to revise)

Q6 What is the calculated p-value? Interpret.

My p value for my data is 0, i belive it is so small that it just registers as 0

Q7 Based on the p-value you interpreted in Q6, would you reject the null hypothesis at the standard 5% significance level and accept the alternative hypothesis that male students are more likely to prefer Imagine Dragons?

Yes i would because my p value was 0

# Calculate the p-value for the original dataset
music_perm %>%
  get_p_value(obs_stat = diff_orig, direction = "greater")
## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1       0

Interpretation