For this WPA, we will use a dataset stored in a tab-separated text file called club.txt.This dataset contains results from a (fictional) survey of 300 attendants at one of three clubs in Konstanz on 8 December 2015. Each row represents one person.
Use the following code chunk to download the tab-separated text file club.txt from http://nathanieldphillips.com/wp-content/uploads/2015/12/club.txt and store it in an object called club.df. Make sure to include the full code in your Markdown document or it may not knit!
club.df <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/12/club.txt",
sep = "\t",
header = T,
stringsAsFactors = F)
Here is a description of the columns in the dataframe:
club - The name of the club the person was at
gender - The person’s gender
drinks - The number of drinks the person had that night
time - The amount of time (in minutes) the person stayed at the club
leavealone - A binary variable indicating whether or not the person left the club alone.
The American Pirate Association has strict rules for how to display the result of hypothesis tests. Here are the formats for the three tests you will conduct in this WPA:
Q1
A fried of mine is convinced that all women are secretly Werewolves. Seriously. Here’s his claim: the longer he’s at a club, the fewer women he sees. Why do women (werewolves) leave so early? He claims that they are are drawn to the moonlight outside, so they can’t help but to leave (him) before the night is over. Let’s test his claim using our data.
with(club.df, boxplot(time ~ gender))
with(club.df, aggregate(time ~ gender,
FUN = mean))
## gender time
## 1 F 134.4167
## 2 M 136.7292
q1.test <- with(club.df, t.test(time ~ gender))
# or
q1.test <- t.test(time ~ gender,
data = club.df)
q1.test
##
## Welch Two Sample t-test
##
## data: time by gender
## t = -0.38152, df = 297.55, p-value = 0.7031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.240836 9.615836
## sample estimates:
## mean in group F mean in group M
## 134.4167 136.7292
#Women and men do not seem to leave clubs at different times, t(297.55) = -0.38, p = 0.70.
q1b.test <- with(subset(club.df, club == "Blechnerei"), t.test(time ~ gender))
# or
q1b.test <- t.test(time ~ gender,
data = club.df,
subset = club == "Blechnerei")
q1b.test
##
## Welch Two Sample t-test
##
## data: time by gender
## t = 0.062752, df = 104.1, p-value = 0.9501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -20.29240 21.61866
## sample estimates:
## mean in group F mean in group M
## 140.9180 140.2549
#Even only looking at people who went to Blechnerei, there does not seem to be a difference between males and females, t(104.1) = 0.06, p = 0.95
Q2
Another friend has other club related ideas. According to her, if you’re looking to meet a nice lady or gentleman at the club, you should definitely have a few drinks to help you loosen up. Do our data support her claim? Test this by answering the question: Do people that did not leave alone tend to drink more or less than people who did leave alone?
with(club.df, boxplot(drinks ~ leavealone))
with(club.df, aggregate(drinks ~ leavealone,
FUN = mean))
## leavealone drinks
## 1 0 3.577465
## 2 1 4.117904
q2.test <- with(club.df, t.test(drinks ~ leavealone))
# OR
q2.test <- t.test(drinks ~ leavealone,
data = club.df)
q2.test
##
## Welch Two Sample t-test
##
## data: drinks by leavealone
## t = -2.6253, df = 121.18, p-value = 0.009772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9479793 -0.1328990
## sample estimates:
## mean in group 0 mean in group 1
## 3.577465 4.117904
# People who leave alone tend to have more drinks than those who do not leave alone, t(121.18) = -2.63, p < .01
q2b.test <- with(subset(club.df, gender == "F"), t.test(drinks ~ leavealone))
# OR
q2b.test <- t.test(drinks ~ leavealone,
data = club.df,
subset = gender == "F")
q2b.test
##
## Welch Two Sample t-test
##
## data: drinks by leavealone
## t = -1.3791, df = 53.466, p-value = 0.1736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9844944 0.1821801
## sample estimates:
## mean in group 0 mean in group 1
## 3.352941 3.754098
# When we only look at Females, there does not seem to be a significant relationship between drinks and leaving times, t(53.47) = -1.38, p = 0.17
Q3
In a later chapter, we’ll learn how to write custom functions that make a lot of your programming life much easier. For example, you can write a custom function that takes a t.test object as an input, and spits out an APA style conclusion as an output! In this question, we’ll create a function called ‘apa’ that does just that:
apa <- function(test.object, tails = 2, sig.digits = 2, p.lb = .01) {
statistic.id <- substr(names(test.object$statistic), start = 1, stop = 1)
p.value <- test.object$p.value
if(tails == 1) {p.value <- p.value / 2}
if (p.value < p.lb) {p.display <- paste("p < ", p.lb, " (", tails, "-tailed)", sep = "")}
if (p.value > p.lb) {p.display <- paste("p = ", round(p.value, sig.digits), " (", tails, "-tailed)", sep = "")}
add.par <- ""
if(grepl("product-moment", test.object$method)) {
estimate.display <- paste("r = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Chi", test.object$method)) {
estimate.display <- ""
add.par <- paste(", N = ", sum(test.object$observed), sep = "")
}
if(grepl("One Sample t-test", test.object$method)) {
estimate.display <- paste("mean = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Two Sample t-test", test.object$method)) {
estimate.display <- paste("mean difference = ", round(test.object$estimate[2] - test.object$estimate[1], sig.digits), ", ", sep = "")
}
return(paste(
estimate.display,
statistic.id,
"(",
round(test.object$parameter, sig.digits),
add.par,
") = ",
round(test.object$statistic, sig.digits),
", ",
p.display,
sep = ""
)
)
}
apa(q1.test)
## [1] "mean difference = 2.31, t(297.55) = -0.38, p = 0.7 (2-tailed)"
apa(q2.test)
## [1] "mean difference = 0.54, t(121.18) = -2.63, p < 0.01 (2-tailed)"
Do the results match what you wrote down for your answers to questions 1 and 2?
Yep!
Q4
Yet another friend of mine has some claims about club life. According to her, the main reason people drink at clubs isn’t to loosen up, it’s to stay awake! Is she right? Is there a relationship between the number of drinks a person has and how long they stay at the club?
with(club.df, plot(drinks, time))
with(club.df, aggregate(time ~ drinks,
FUN = mean))
## drinks time
## 1 0 85.40000
## 2 1 115.84615
## 3 2 97.03226
## 4 3 129.49123
## 5 4 136.85542
## 6 5 144.95522
## 7 6 155.31034
## 8 7 174.63636
## 9 8 194.00000
## 10 9 258.00000
q4.test <- with(club.df, cor.test(time, drinks))
#OR
q4.test <- cor.test(~ time + drinks,
data = club.df)
q4.test
##
## Pearson's product-moment correlation
##
## data: time and drinks
## t = 6.6984, df = 298, p-value = 1.05e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2591255 0.4562998
## sample estimates:
## cor
## 0.3617512
# There is a significant positive correlation between the number of drinks a person has and how long they spend at the club, t(298) = 6.70, p < .01
q4b.test <- with(subset(club.df, gender == "F" &
club == "Blechnerei"), cor.test(time, drinks))
#OR
q4b.test <- cor.test(~ time + drinks,
data = club.df,
subset = gender == "F" & club == "Blechnerei"
)
q4b.test
##
## Pearson's product-moment correlation
##
## data: time and drinks
## t = 2.7597, df = 59, p-value = 0.007695
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09433171 0.54365162
## sample estimates:
## cor
## 0.3381205
# The results are very similar, t(59) = 2.76, p < .01
Q5
I don’t know about Germany, but in the US, we refer to clubs with mostly guys as “sausage fests.” Maybe in Germany you’d call it a Wurstfest. Let’s see if any of the clubs were a Wurstfest on this day.
club.df$gender.log <- club.df$gender == "M"
genderagg <- with(club.df, aggregate(gender.log ~ club,
FUN = mean))
If you’re not familiar with barplots, here’s an example of how to use it:
barplot(height = c(5, 3, 6, 3, 1),
names = 1:5,
col = "white"
)
barplot(height = genderagg$gender.log,
names = c(1, 2, 3),
col = c("gray", "gray", "royalblue3"))
q5.test <- with(club.df, chisq.test(club, gender))
q5.test
##
## Pearson's Chi-squared test
##
## data: club and gender
## X-squared = 13.74, df = 2, p-value = 0.001038
apa(q5.test)
## [1] "X(2, N = 300) = 13.74, p < 0.01 (2-tailed)"
# There is a significant relationship between clubs and gender, X(2, N = 300) = 13.74, p < .01 (2-tailed)
q5b.test <- with(subset(club.df, club %in% c("Kantine", "Barrys")), chisq.test(club, gender))
apa(q5b.test)
## [1] "X(1, N = 188) = 12.24, p < 0.01 (2-tailed)"
# Even only looking at Kantine and Barrys, there is a significant relationship bewteen clubs and gender, X(1, N = 188) = 12.24, p < .01 (2-tailed). Since we know from before that Barrys has a higher proportion of Females, we can conclude that there is a signifciantly higher proportion of men at Kantine than Barrys. Barrys is thus a true Wurstfest.
Q6
Who is more likely to leave a club alone, Men or Women?
genderagg <- with(club.df, aggregate(leavealone ~ gender,
FUN = mean))
barplot(height = genderagg$leavealone,
names = genderagg$gender)
q6.test <- with(club.df, chisq.test(gender, leavealone))
q6.test
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: gender and leavealone
## X-squared = 0.43293, df = 1, p-value = 0.5106
apa(q6.test)
## [1] "X(1, N = 300) = 0.43, p = 0.51 (2-tailed)"
# There is no significant relationship between gender and whether or not people go home alone, X(1, N = 300) = 0.43, p = 0.51 (2-tailed)
q6b.test <- with(subset(club.df, time > 60),
chisq.test(gender, leavealone))
q6b.test
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: gender and leavealone
## X-squared = 0.88492, df = 1, p-value = 0.3469
apa(q6b.test)
## [1] "X(1, N = 277) = 0.88, p = 0.35 (2-tailed)"
# Even only looking at people who stayed at least 60 minutes, there is no significant relationship between gender and whether or not people go home alone, X(1, N = 277) = 0.88, p = 0.35 (2-tailed)