A. Create a plot (e.g. boxplot or beanplot) showing the distribution of club times for males and females
boxplot(club.df$time~club.df$gender)
B. Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that men and women stayed at the club(s)
aggregate(club.df$time~club.df$gender, FUN = mean, data = club.df)
## club.df$gender club.df$time
## 1 F 134.4167
## 2 M 136.7292
C. Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of time women and men spend at clubs. Save the result as an object called q1.test
q1.test <- t.test(club.df$time~club.df$gender, alternative = "two.sided" )
q1.test
##
## Welch Two Sample t-test
##
## data: club.df$time by club.df$gender
## t = -0.38152, df = 297.55, p-value = 0.7031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.240836 9.615836
## sample estimates:
## mean in group F mean in group M
## 134.4167 136.7292
D. Write your conclusion in APA format. Be sure to address my friend’s claim that women are werewolves.
There is no difference between the means of men and women spending time at a club (t(297) = -0.38, p = 0.70).
E. Do the results change if you only look at people who were at the Blechnerei? Using only the Blechnerei data, repeat the test and write your conclusion in APA format
subset.club <- subset(club.df, club == "Blechnerei", select = c(gender, time))
q1.test.blech <- t.test(subset.club$time~subset.club$gender, alternative = "two.sided" )
q1.test.blech
##
## Welch Two Sample t-test
##
## data: subset.club$time by subset.club$gender
## t = 0.062752, df = 104.1, p-value = 0.9501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -20.29240 21.61866
## sample estimates:
## mean in group F mean in group M
## 140.9180 140.2549
There is still no difference between the gender (t(104) = 0.06, p = 0.9501).
Create a plot (e.g. boxplot or beanplot) showing the distribution of drinks for people that did and did not leave alone.
beanplot(club.df$drinks~club.df$leavealone)
Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.
club.df %>%
group_by(leavealone) %>%
summarise(
q2.mean = mean(drinks)
)
## Source: local data frame [2 x 2]
##
## leavealone q2.mean
## (int) (dbl)
## 1 0 3.577465
## 2 1 4.117904
Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of drinks people had when they went home alone versus not alone. Save the result as an object called q2.test.
q2.test <- t.test(club.df$drinks~club.df$leavealone, alternative = "two.sided" )
q2.test
##
## Welch Two Sample t-test
##
## data: club.df$drinks by club.df$leavealone
## t = -2.6253, df = 121.18, p-value = 0.009772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9479793 -0.1328990
## sample estimates:
## mean in group 0 mean in group 1
## 3.577465 4.117904
Write your conclusion in APA format. The friend is right if you take 5% or 1% as a decision rule (t(121) = -2.62, p = 0.0098).
Do the results change if you ignore Males and only test Females? Using only the Female data, repeat the test and write your conclusion in APA format
subset.club2 <- subset(club.df, gender == "F", select = c(drinks, leavealone))
q2.test.m <- t.test(subset.club2$drinks~subset.club2$leavealone, alternative = "two.sided" )
q2.test.m
##
## Welch Two Sample t-test
##
## data: subset.club2$drinks by subset.club2$leavealone
## t = -1.3791, df = 53.466, p-value = 0.1736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9844944 0.1821801
## sample estimates:
## mean in group 0 mean in group 1
## 3.352941 3.754098
It is not valid to reject the null hypothesis (t(53) = -1.38, p = 0.17). That means that you cannot say that women that do not leave alone drink more or less than women that leave alone.
apa(q1.test)
## [1] "mean difference = 2.31, t(297.55) = -0.38, p = 0.7 (2-tailed)"
apa(q2.test)
## [1] "mean difference = 0.54, t(121.18) = -2.63, p < 0.01 (2-tailed)"
The result look more or the less. However, using the apa-function makes it a lot easier und better looking.
Create a plot (e.g. scatterplot) showing the relationship between drinks and time.
plot(club.df$time~club.df$drinks)
abline(lm(time~drinks, data = club.df))
Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that people stay at the club for each drink amount.
aggregate(club.df$time~club.df$drinks, FUN = mean, club.df)
## club.df$drinks club.df$time
## 1 0 85.40000
## 2 1 115.84615
## 3 2 97.03226
## 4 3 129.49123
## 5 4 136.85542
## 6 5 144.95522
## 7 6 155.31034
## 8 7 174.63636
## 9 8 194.00000
## 10 9 258.00000
Is the difference significant? Conduct a correlation test and save the result as an object called q4.test
q4.test <- cor.test(club.df$time, club.df$drinks)
q4.test
##
## Pearson's product-moment correlation
##
## data: club.df$time and club.df$drinks
## t = 6.6984, df = 298, p-value = 1.05e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2591255 0.4562998
## sample estimates:
## cor
## 0.3617512
Write your result in APA format.
apa(q4.test)
## [1] "r = 0.36, t(298) = 6.7, p < 0.01 (2-tailed)"