Q1

A. Create a plot (e.g. boxplot or beanplot) showing the distribution of club times for males and females

boxplot(club.df$time~club.df$gender)

B. Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that men and women stayed at the club(s)

aggregate(club.df$time~club.df$gender, FUN = mean, data = club.df)
##   club.df$gender club.df$time
## 1              F     134.4167
## 2              M     136.7292

C. Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of time women and men spend at clubs. Save the result as an object called q1.test

q1.test <- t.test(club.df$time~club.df$gender, alternative = "two.sided" )
q1.test
## 
##  Welch Two Sample t-test
## 
## data:  club.df$time by club.df$gender
## t = -0.38152, df = 297.55, p-value = 0.7031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.240836   9.615836
## sample estimates:
## mean in group F mean in group M 
##        134.4167        136.7292

D. Write your conclusion in APA format. Be sure to address my friend’s claim that women are werewolves.

There is no difference between the means of men and women spending time at a club (t(297) = -0.38, p = 0.70).

E. Do the results change if you only look at people who were at the Blechnerei? Using only the Blechnerei data, repeat the test and write your conclusion in APA format

subset.club <- subset(club.df, club == "Blechnerei", select = c(gender, time))
q1.test.blech <-  t.test(subset.club$time~subset.club$gender, alternative = "two.sided" )
q1.test.blech
## 
##  Welch Two Sample t-test
## 
## data:  subset.club$time by subset.club$gender
## t = 0.062752, df = 104.1, p-value = 0.9501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -20.29240  21.61866
## sample estimates:
## mean in group F mean in group M 
##        140.9180        140.2549

There is still no difference between the gender (t(104) = 0.06, p = 0.9501).

Q2

Create a plot (e.g. boxplot or beanplot) showing the distribution of drinks for people that did and did not leave alone.

beanplot(club.df$drinks~club.df$leavealone)

Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.

club.df %>%
  group_by(leavealone) %>%
  summarise(
    q2.mean = mean(drinks)
  )
## Source: local data frame [2 x 2]
## 
##   leavealone  q2.mean
##        (int)    (dbl)
## 1          0 3.577465
## 2          1 4.117904

Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of drinks people had when they went home alone versus not alone. Save the result as an object called q2.test.

q2.test <- t.test(club.df$drinks~club.df$leavealone, alternative = "two.sided" )
q2.test
## 
##  Welch Two Sample t-test
## 
## data:  club.df$drinks by club.df$leavealone
## t = -2.6253, df = 121.18, p-value = 0.009772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9479793 -0.1328990
## sample estimates:
## mean in group 0 mean in group 1 
##        3.577465        4.117904

Write your conclusion in APA format. The friend is right if you take 5% or 1% as a decision rule (t(121) = -2.62, p = 0.0098).

Do the results change if you ignore Males and only test Females? Using only the Female data, repeat the test and write your conclusion in APA format

subset.club2 <- subset(club.df, gender == "F", select = c(drinks, leavealone))
q2.test.m <-  t.test(subset.club2$drinks~subset.club2$leavealone, alternative = "two.sided" )
q2.test.m
## 
##  Welch Two Sample t-test
## 
## data:  subset.club2$drinks by subset.club2$leavealone
## t = -1.3791, df = 53.466, p-value = 0.1736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9844944  0.1821801
## sample estimates:
## mean in group 0 mean in group 1 
##        3.352941        3.754098

It is not valid to reject the null hypothesis (t(53) = -1.38, p = 0.17). That means that you cannot say that women that do not leave alone drink more or less than women that leave alone.

Q3

apa(q1.test)
## [1] "mean difference = 2.31, t(297.55) = -0.38, p = 0.7 (2-tailed)"
apa(q2.test)
## [1] "mean difference = 0.54, t(121.18) = -2.63, p < 0.01 (2-tailed)"

The result look more or the less. However, using the apa-function makes it a lot easier und better looking.

Q4

Create a plot (e.g. scatterplot) showing the relationship between drinks and time.

plot(club.df$time~club.df$drinks)
abline(lm(time~drinks, data = club.df))

Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that people stay at the club for each drink amount.

aggregate(club.df$time~club.df$drinks, FUN = mean, club.df)
##    club.df$drinks club.df$time
## 1               0     85.40000
## 2               1    115.84615
## 3               2     97.03226
## 4               3    129.49123
## 5               4    136.85542
## 6               5    144.95522
## 7               6    155.31034
## 8               7    174.63636
## 9               8    194.00000
## 10              9    258.00000

Is the difference significant? Conduct a correlation test and save the result as an object called q4.test

q4.test <- cor.test(club.df$time, club.df$drinks)
q4.test
## 
##  Pearson's product-moment correlation
## 
## data:  club.df$time and club.df$drinks
## t = 6.6984, df = 298, p-value = 1.05e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2591255 0.4562998
## sample estimates:
##       cor 
## 0.3617512

Write your result in APA format.

apa(q4.test)
## [1] "r = 0.36, t(298) = 6.7, p < 0.01 (2-tailed)"