club <- read.delim("~/Dropbox/RSeminar/club.txt")
Q1
A fried of mine is convinced that all women are secretly Werewolves. Seriously. Here’s his claim: the longer he’s at a club, the fewer women he sees. Why do women (werewolves) leave so early? He claims that they are are drawn to the moonlight outside, so they can’t help but to leave (him) before the night is over. Let’s test his claim using our data.
Create a plot (e.g. boxplot or beanplot) showing the distribution of club times for males and females
library("beanplot")
beanplot(club$time ~ club$gender,
main = "Time Distribution",
xlab = "Gender",
ylab = "Time")
Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that men and women stayed at the club(s)
aggregate(time~gender,data = club, mean)
## gender time
## 1 F 134.4167
## 2 M 136.7292
Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of time women and men spend at clubs. Save the result as an object called q1.test
q1.test <-t.test(time~gender, data = club, alternative = "t")
Write your conclusion in APA format. Be sure to address my friend’s claim that women are werewolves.
q1.test$statistic
## t
## -0.3815224
q1.test$parameter
## df
## 297.5547
q1.test$p.value
## [1] 0.703088
t(297.5547) = -0.3815224, p = 0.703088
Do the results change if you only look at people who were at the Blechnerei? Using only the Blechnerei data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)
q1.test <-t.test(time~gender,subset = club == "Blechnerei",data = club, alternative = "t")
q1.test$statistic
## t
## 0.06275184
q1.test$parameter
## df
## 104.101
q1.test$p.value
## [1] 0.9500844
t(104.101) = 0.06275184, p = 0.9500844
Q2
Another friend has other club related ideas. According to her, if you’re looking to meet a nice lady or gentleman at the club, you should definitely have a few drinks to help you loosen up. Do our data support her claim? Test this by answering the question: Do people that did not leave alone tend to drink more or less than people who did leave alone?
Create a plot (e.g. boxplot or beanplot) showing the distribution of drinks for people that did and did not leave alone
beanplot(club$drinks ~ club$leavealone,
main = "Drink Distribution",
xlab = "leavealone",
ylab = "Drinks")
Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.
aggregate(drinks~leavealone,data = club, mean)
## leavealone drinks
## 1 0 3.577465
## 2 1 4.117904
Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of drinks people had when they went home alone versus not alone. Save the result as an object called q2.test
q2.test <-t.test(drinks~leavealone,data = club, alternative = "t")
q2.test$statistic
## t
## -2.625326
q2.test$parameter
## df
## 121.1829
q2.test$p.value
## [1] 0.009772036
Write your conclusion in APA format.
t(121.1829) = -2.625326, p = 0.009772036
Do the results change if you ignore Males and only test Females? Using only the Female data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)
q2.test <-t.test(drinks~leavealone,subset = gender == "F",data = club, alternative = "t")
q2.test$statistic
## t
## -1.379058
q2.test$parameter
## df
## 53.46589
q2.test$p.value
## [1] 0.1736189
t(53.46589) = -1.379058, p = 0.1736189
Q3
In a later chapter, we’ll learn how to write custom functions that make a lot of your programming life much easier. For example, you can write a custom function that takes a t.test object as an input, and spits out an APA style conclusion as an output! In this question, we’ll create a function called ‘apa’ that does just that:
Execute all the code in the following chunk to save the new function.
apa <- function(test.object, tails = 2, sig.digits = 2, p.lb = .01) {
statistic.id <- substr(names(test.object$statistic), start = 1, stop = 1)
p.value <- test.object$p.value
if(tails == 1) {p.value <- p.value / 2}
if (p.value < p.lb) {p.display <- paste("p < ", p.lb, " (", tails, "-tailed)", sep = "")}
if (p.value > p.lb) {p.display <- paste("p = ", round(p.value, sig.digits), " (", tails, "-tailed)", sep = "")}
add.par <- ""
if(grepl("product-moment", test.object$method)) {
estimate.display <- paste("r = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Chi", test.object$method)) {
estimate.display <- ""
add.par <- paste(", N = ", sum(test.object$observed), sep = "")
}
if(grepl("One Sample t-test", test.object$method)) {
estimate.display <- paste("mean = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Two Sample t-test", test.object$method)) {
estimate.display <- paste("mean difference = ", round(test.object$estimate[2] - test.object$estimate[1], sig.digits), ", ", sep = "")
}
return(paste(
estimate.display,
statistic.id,
"(",
round(test.object$parameter, sig.digits),
add.par,
") = ",
round(test.object$statistic, sig.digits),
", ",
p.display,
sep = ""
)
)
}
Now, try the function on your previous test results from Q1 and Q2 by executing the following two lines of code.
apa(q1.test)
## [1] "mean difference = -0.66, t(104.1) = 0.06, p = 0.95 (2-tailed)"
apa(q2.test)
## [1] "mean difference = 0.4, t(53.47) = -1.38, p = 0.17 (2-tailed)"
Q4
Yet another friend of mine has some claims about club life. According to her, the main reason people drink at clubs isn’t to loosen up, it’s to stay awake! Is she right? Is there a relationship between the number of drinks a person has and how long they stay at the club?
Create a plot (e.g. scatterplot) showing the relationship between drinks and time.
plot(drinks~time, data = club, main = "Drinks over Time Distribution")
Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that people stay at the club for each drink amount.
aggregate(time~drinks, data = club, mean)
## drinks time
## 1 0 85.40000
## 2 1 115.84615
## 3 2 97.03226
## 4 3 129.49123
## 5 4 136.85542
## 6 5 144.95522
## 7 6 155.31034
## 8 7 174.63636
## 9 8 194.00000
## 10 9 258.00000
Is the difference significant? Conduct a correlation test and save the result as an object called q4.test
q4.test <- cor.test(club$time,club$drinks)
Write your result in APA format.
apa(q4.test)
## [1] "r = 0.36, t(298) = 6.7, p < 0.01 (2-tailed)"
Repeat the test but only for females at Blechnerei. Do you get the same conclusion? Write the results of this test in APA format
q4.test <- cor.test(subset(club, gender == "F" & club == "Blechnerei")$time, subset(club, gender == "F" & club == "Blechnerei")$drinks)
apa(q4.test)
## [1] "r = 0.34, t(59) = 2.76, p < 0.01 (2-tailed)"
Q5
I don’t know about Germany, but in the US, we refer to clubs with mostly guys as “sausage fests.” Maybe in Germany you’d call it a Wurstfest. Let’s see if any of the clubs were a Wurstfest on this day.
What is the percentage of Males in each club? (Hint: Calculate a new binary variable called gender.log with 1 meaning Male and 0 meaning Female. Then, use grouped aggregation to calculate the percentage of 1s in gender.log for each club)
gender.log <- (club$gender == "M")
agg <-aggregate(gender.log ~ club, data = club, mean)
Plot the results using a barplot. Set the height argument to be the percentage of males in each club, and set the names argument to be the names of the clubs. (For bonus points, make set the color of the bars for any Wurstfests to be “royalblue3”).
barplot(height = agg$gender.log,
names = agg$club,
col = "white"
)
Is there a significant relationship between clubs and gender? Answer this using a chi-square test. Run the test and save the result in an object called q4.test
q4.test <- chisq.test(club$club, club$gender)
What is your conclusion in APA format?
apa(q4.test)
## [1] "X(2, N = 300) = 13.74, p < 0.01 (2-tailed)"
Was there a significant difference between just Kantine and Barry’s? Do the test again using only data from these two clubs. Report your results in APA format.
q4.test <- chisq.test(subset(club, club %in% c("Kantine","Barrys"))$club, subset(club, club %in% c("Kantine","Barrys"))$gender)
apa(q4.test)
## [1] "X(1, N = 188) = 12.24, p < 0.01 (2-tailed)"
Q6
Who is more likely to leave a club alone, Men or Women? Calculate the percentage of Men and women who leave alone.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
lag <- club %>%
group_by(gender) %>%
summarise(
leavelone = mean(leavealone),
number = n())
Plot the results using a barplot. Set the height argument to be the percentage of people who leave alone for each gender, and set the names argument to be the gender.
barplot(height = lag$leavelone,
names = lag$gender,
col = "white"
)
Is there a significant relationship between clubs and gender? Answer this using a chi-square test. Run the test and save the result in an object called q6.test
q6.test <- chisq.test(club$leavealone, club$gender)
What is your conclusion in APA format?
apa(q6.test)
## [1] "X(1, N = 300) = 0.43, p = 0.51 (2-tailed)"
Does your conclusion hold if you only include people who stayed at the club for more than 60 minutes? Repeat the test on these data and report your conclusions in APA format.
q6.test <- chisq.test(subset(club, time > 60)$leavealone, subset(club, time > 60)$gender)
apa(q6.test)
## [1] "X(1, N = 277) = 0.88, p = 0.35 (2-tailed)"
```