Suppose that you wanted to investigate whether there is a gender difference in musical preference among Plymouth State University students. To investigate this question, you took a small sample of PSU students. The sample data is stored in music.csv.
Import music.csv from Moodle under Date Files, and test the hypothesis as described as below.
# Load packages
library(dplyr)
library(ggplot2)
library(infer)
# Import data
music <- read.csv("/resources/rstudio/Business Statistics/data/music.csv")
head(music)
## singer sex
## 1 Grande female
## 2 Grande female
## 3 Grande female
## 4 Grande female
## 5 Grande female
## 6 Grande female
15 out of 18 males prefer dragons
music %>%
# Count the rows by singer and sex
count(sex, singer)
## # A tibble: 2 x 3
## sex singer n
## <fct> <fct> <int>
## 1 female Grande 14
## 2 male Dragons 15
Interpretation
83.3 percent of male students prefer dragons.
# Find proportion of each sex who were Dragons
music %>%
# Group by sex
group_by(sex) %>%
# Calculate proportion Dragons summary stat
summarise(Dragons_prop = mean(singer == "Dragons"))
## # A tibble: 2 x 2
## sex Dragons_prop
## <fct> <dbl>
## 1 female 0
## 2 male 1
Interpretation
Most male students prefer dragons over grande where at female students prefer grande
This means that the majority of males prefer dragons and the majority of females prefer grande
This would mean that the sex of the student goes against the norm and prefers the other preformer
# Calculate the observed difference in promotion rate
diff_orig <- music %>%
# Group by sex
group_by(sex) %>%
# Summarize to calculate fraction Dragons
summarise(prop_prom = mean(singer == "Dragons")) %>%
# Summarize to calculate difference
summarise(stat = diff(prop_prom)) %>%
pull()
# See the result
diff_orig # male - female
## [1] 1
# Create data frame of permuted differences in promotion rates
music_perm <- music %>%
# Specify variables: singer (response variable) and sex (explanatory variable)
specify(singer ~ sex, success = "Dragons") %>%
# Set null hypothesis as independence: there is no gender musicrimination
hypothesize(null = "independence") %>%
# Shuffle the response variable, singer, one thousand times
generate(reps = 1000, type = "permute") %>%
# Calculate difference in proportion, male then female
calculate(stat = "diff in props", order = c("male", "female")) # male - female
music_perm
## # A tibble: 1,000 x 2
## replicate stat
## <int> <dbl>
## 1 1 0.0333
## 2 2 0.310
## 3 3 -0.243
## 4 4 0.0333
## 5 5 0.0333
## 6 6 0.310
## 7 7 -0.105
## 8 8 -0.105
## 9 9 0.0333
## 10 10 0.0333
## # ... with 990 more rows
# Using permutation data, plot stat
ggplot(music_perm, aes(x = stat)) +
# Add a histogram layer
geom_histogram(binwidth = 0.01) +
# Using original data, add a vertical line at stat
geom_vline(aes(xintercept = diff_orig), color = "red")
Interpretation (no need to revise)
My p value for my data is 0, i belive it is so small that it just registers as 0
Yes i would because my p value was 0
# Calculate the p-value for the original dataset
music_perm %>%
get_p_value(obs_stat = diff_orig, direction = "greater")
## # A tibble: 1 x 1
## p_value
## <dbl>
## 1 0
Interpretation