WPA 6

club <- read.delim("~/Dropbox/RSeminar/club.txt")

A fried of mine is convinced that all women are secretly Werewolves. Seriously. Here’s his claim: the longer he’s at a club, the fewer women he sees. Why do women (werewolves) leave so early? He claims that they are are drawn to the moonlight outside, so they can’t help but to leave (him) before the night is over. Let’s test his claim using our data.

Create a plot (e.g. boxplot or beanplot) showing the distribution of club times for males and females

library("beanplot")
beanplot(club$time ~ club$gender,
         main = "Time Distribution",
         xlab = "Gender",
         ylab = "Time")

Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that men and women stayed at the club(s)

aggregate(time~gender,data = club, mean)

##   gender     time
## 1      F 134.4167
## 2      M 136.7292

Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of time women and men spend at clubs. Save the result as an object called q1.test

q1.test <-t.test(time~gender, data = club, alternative = "t")

Write your conclusion in APA format. Be sure to address my friend’s claim that women are werewolves.

q1.test$statistic

##          t 
## -0.3815224

q1.test$parameter

##       df 
## 297.5547

q1.test$p.value

## [1] 0.703088

t(297.5547) = -0.3815224, p = 0.703088

Do the results change if you only look at people who were at the Blechnerei? Using only the Blechnerei data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)

q1.test <-t.test(time~gender,subset = club == "Blechnerei",data = club, alternative = "t")
q1.test$statistic

##          t 
## 0.06275184

q1.test$parameter

##      df 
## 104.101

q1.test$p.value

## [1] 0.9500844

t(104.101) = 0.06275184, p = 0.9500844

Another friend has other club related ideas. According to her, if you’re looking to meet a nice lady or gentleman at the club, you should definitely have a few drinks to help you loosen up. Do our data support her claim? Test this by answering the question: Do people that did not leave alone tend to drink more or less than people who did leave alone?

Create a plot (e.g. boxplot or beanplot) showing the distribution of drinks for people that did and did not leave alone

beanplot(club$drinks ~ club$leavealone,
         main = "Drink Distribution",
         xlab = "leavealone",
         ylab = "Drinks")

Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.

aggregate(drinks~leavealone,data = club, mean)

##   leavealone   drinks
## 1          0 3.577465
## 2          1 4.117904

Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of drinks people had when they went home alone versus not alone. Save the result as an object called q2.test

q2.test <-t.test(drinks~leavealone,data = club, alternative = "t")
q2.test$statistic

##         t 
## -2.625326

q2.test$parameter

##       df 
## 121.1829

q2.test$p.value

## [1] 0.009772036

Write your conclusion in APA format.

t(121.1829) = -2.625326, p = 0.009772036

Do the results change if you ignore Males and only test Females? Using only the Female data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)

q2.test <-t.test(drinks~leavealone,subset = gender == "F",data = club, alternative = "t")
q2.test$statistic

##         t 
## -1.379058

q2.test$parameter

##       df 
## 53.46589

q2.test$p.value

## [1] 0.1736189

t(53.46589) = -1.379058, p = 0.1736189

In a later chapter, we’ll learn how to write custom functions that make a lot of your programming life much easier. For example, you can write a custom function that takes a t.test object as an input, and spits out an APA style conclusion as an output! In this question, we’ll create a function called ‘apa’ that does just that:

Execute all the code in the following chunk to save the new function.

apa <- function(test.object, tails = 2, sig.digits = 2, p.lb = .01) {

  statistic.id <- substr(names(test.object$statistic), start = 1, stop = 1)
  p.value <- test.object$p.value

  if(tails == 1) {p.value <- p.value / 2}

  if (p.value < p.lb) {p.display <- paste("p < ", p.lb, " (", tails, "-tailed)", sep = "")}
  if (p.value > p.lb) {p.display <- paste("p = ", round(p.value, sig.digits), " (", tails, "-tailed)", sep = "")}


  add.par <- ""

  if(grepl("product-moment", test.object$method)) {

    estimate.display <- paste("r = ", round(test.object$estimate, sig.digits), ", ", sep = "")

  }

  if(grepl("Chi", test.object$method)) {

    estimate.display <- ""

    add.par <- paste(", N = ", sum(test.object$observed), sep = "")

  }

  if(grepl("One Sample t-test", test.object$method)) {

    estimate.display <- paste("mean = ", round(test.object$estimate, sig.digits), ", ", sep = "")

  }

  if(grepl("Two Sample t-test", test.object$method)) {

    estimate.display <- paste("mean difference = ", round(test.object$estimate[2] - test.object$estimate[1], sig.digits), ", ", sep = "")

  }




  return(paste(
    estimate.display,
    statistic.id,
    "(",
               round(test.object$parameter, sig.digits),
               add.par,
               ") = ",
               round(test.object$statistic, sig.digits),
               ", ",
               p.display,
               sep = ""
  )
  )

}

Now, try the function on your previous test results from Q1 and Q2 by executing the following two lines of code.

apa(q1.test)

## [1] "mean difference = -0.66, t(104.1) = 0.06, p = 0.95 (2-tailed)"

apa(q2.test)

## [1] "mean difference = 0.4, t(53.47) = -1.38, p = 0.17 (2-tailed)"

Yet another friend of mine has some claims about club life. According to her, the main reason people drink at clubs isn’t to loosen up, it’s to stay awake! Is she right? Is there a relationship between the number of drinks a person has and how long they stay at the club?

Create a plot (e.g. scatterplot) showing the relationship between drinks and time.

plot(drinks~time, data = club, main = "Drinks over Time Distribution")

Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that people stay at the club for each drink amount.

aggregate(time~drinks, data = club, mean)

##    drinks      time
## 1       0  85.40000
## 2       1 115.84615
## 3       2  97.03226
## 4       3 129.49123
## 5       4 136.85542
## 6       5 144.95522
## 7       6 155.31034
## 8       7 174.63636
## 9       8 194.00000
## 10      9 258.00000

Is the difference significant? Conduct a correlation test and save the result as an object called q4.test

q4.test <- cor.test(club$time,club$drinks)

Write your result in APA format.

apa(q4.test)

## [1] "r = 0.36, t(298) = 6.7, p < 0.01 (2-tailed)"

Repeat the test but only for females at Blechnerei. Do you get the same conclusion? Write the results of this test in APA format

q4.test <- cor.test(subset(club, gender == "F" & club == "Blechnerei")$time, subset(club, gender == "F" & club == "Blechnerei")$drinks)
apa(q4.test)

## [1] "r = 0.34, t(59) = 2.76, p < 0.01 (2-tailed)"

I don’t know about Germany, but in the US, we refer to clubs with mostly guys as “sausage fests.” Maybe in Germany you’d call it a Wurstfest. Let’s see if any of the clubs were a Wurstfest on this day.

What is the percentage of Males in each club? (Hint: Calculate a new binary variable called gender.log with 1 meaning Male and 0 meaning Female. Then, use grouped aggregation to calculate the percentage of 1s in gender.log for each club)

gender.log <- (club$gender == "M")
agg <-aggregate(gender.log ~ club, data = club, mean)

Plot the results using a barplot. Set the height argument to be the percentage of males in each club, and set the names argument to be the names of the clubs. (For bonus points, make set the color of the bars for any Wurstfests to be “royalblue3”).

barplot(height = agg$gender.log,
        names = agg$club,
        col = "white"
        )

Is there a significant relationship between clubs and gender? Answer this using a chi-square test. Run the test and save the result in an object called q4.test

q4.test <- chisq.test(club$club, club$gender)

What is your conclusion in APA format?

apa(q4.test)

## [1] "X(2, N = 300) = 13.74, p < 0.01 (2-tailed)"

Was there a significant difference between just Kantine and Barry’s? Do the test again using only data from these two clubs. Report your results in APA format.

q4.test <- chisq.test(subset(club, club %in% c("Kantine","Barrys"))$club, subset(club, club %in% c("Kantine","Barrys"))$gender)
apa(q4.test)

## [1] "X(1, N = 188) = 12.24, p < 0.01 (2-tailed)"

Who is more likely to leave a club alone, Men or Women? Calculate the percentage of Men and women who leave alone.

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

lag <- club %>%
  group_by(gender) %>%
  summarise(
    leavelone = mean(leavealone),
    number = n())

Plot the results using a barplot. Set the height argument to be the percentage of people who leave alone for each gender, and set the names argument to be the gender.

barplot(height = lag$leavelone,
        names = lag$gender,
        col = "white"
        )

Is there a significant relationship between clubs and gender? Answer this using a chi-square test. Run the test and save the result in an object called q6.test

q6.test <- chisq.test(club$leavealone, club$gender)

What is your conclusion in APA format?

apa(q6.test)

## [1] "X(1, N = 300) = 0.43, p = 0.51 (2-tailed)"

Does your conclusion hold if you only include people who stayed at the club for more than 60 minutes? Repeat the test on these data and report your conclusions in APA format.

q6.test <- chisq.test(subset(club, time > 60)$leavealone, subset(club, time > 60)$gender)
apa(q6.test)

## [1] "X(1, N = 277) = 0.88, p = 0.35 (2-tailed)"

```

WPA 6

Anna Martin

9 December 2015