WPA6

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

club.df <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/12/club.txt", 
                     sep = "\t", 
                     header = T, 
                     stringsAsFactors = F)

Q1 A fried of mine is convinced that all women are secretly Werewolves. Seriously. Here’s his claim: the longer he’s at a club, the fewer women he sees. Why do women (werewolves) leave so early? He claims that they are are drawn to the moonlight outside, so they can’t help but to leave (him) before the night is over. Let’s test his claim using our data.

a) Create a plot (e.g. boxplot or beanplot) showing the distribution of club times for males and females

boxplot(formula = time ~ gender, 
data = club.df,
xlab = "gender",
ylab = "time",
main = "Club times for males and females")

b) Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of minutes that men and women stayed at the club(s)

aggregate(formula = time ~ gender, 
FUN = mean, 
na.rm = T, 
data = club.df 
)

##   gender     time
## 1      F 134.4167
## 2      M 136.7292

c) Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of time women and men spend at clubs. Save the result as an object called q1.test

men.time <- subset(club.df, subset = gender == "M")$time
female.time <- subset(club.df, subset = gender == "F")$time 

q1.test <- t.test(x = men.time,
y = female.time,
alternative = "two.sided" 
)

q1.test

## 
##  Welch Two Sample t-test
## 
## data:  men.time and female.time
## t = 0.38152, df = 297.55, p-value = 0.7031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.615836 14.240836
## sample estimates:
## mean of x mean of y 
##  136.7292  134.4167

Write your conclusion in APA format. Be sure to address my friend’s claim that women are werewolves.

results: t-test: t(297.55) = 0.38152, p = 0.7031. Therefore it’s not true that women tend to leave the club earlier then men, there is no significant difference between the two groups -> women are no werewolves!!!

e) Do the results change if you only look at people who were at the Blechnerei? Using only the Blechnerei data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)

male.time.bleche <- subset(club.df, gender == "M" & club == "Blechnerei" )$time

female.time.bleche<- subset(club.df, gender == "F" & club == "Blechnerei" )$time

q2.test <- t.test(x = male.time.bleche,
y = female.time.bleche,
alternative = "two.sided" 
)

q2.test

## 
##  Welch Two Sample t-test
## 
## data:  male.time.bleche and female.time.bleche
## t = -0.062752, df = 104.1, p-value = 0.9501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.61866  20.29240
## sample estimates:
## mean of x mean of y 
##  140.2549  140.9180

results : t-test: t(104.1) = -0.062752, p = 0.9501. Here the difference isn’t significant either, so it seems like your friends hypothesis is wrong ;)

a) Create a plot (e.g. boxplot or beanplot) showing the distribution of drinks for people that did and did not leave alone

boxplot(formula = drinks ~ leavealone,
data = club.df,
xlab = "leavealone",
ylab = "drinks",
main = "drinks for loners or one-night-stands")

####b) Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.

```{

drink.alone <- subset(club.df, leavealone == 1)$drinks
drink.company <- subset(club.df, leavealone == 0)$drinks

mean(drink.alone)

## [1] 4.117904

mean (drink.company)

## [1] 3.577465

###now with aggreagate: 
aggregate(formula = drinks~ leavealone,
FUN = mean, 
na.rm = T, 
data = club.df 
)

##   leavealone   drinks
## 1          0 3.577465
## 2          1 4.117904

c) Conduct a two-tailed t-test testing whether or not there is a significant difference in the amount of drinks people had when they went home alone versus not alone. Save the result as an object called q2.test

q3.test <- t.test(x = drink.alone,
y = drink.company,
alternative = "two.sided" 
)
q3.test

## 
##  Welch Two Sample t-test
## 
## data:  drink.alone and drink.company
## t = 2.6253, df = 121.18, p-value = 0.009772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1328990 0.9479793
## sample estimates:
## mean of x mean of y 
##  4.117904  3.577465

results: t(121.18) = 2.6253, p= 0.009772 This result shows us that the hypothesis that people who leave the club in company had more drinks is wrong because people who leave the club in company tend to have significantly less drinks than people who leave the club alone.

Do the results change if you ignore Males and only test Females? Using only the Female data, repeat the test and write your conclusion in APA format (Hint: Use subset()!)

female.alone <- subset(club.df, leavealone == "1" & gender == "F")$drinks
female.company <- subset(club.df, leavealone == "0" & gender == "F")$drinks

q4.test <- t.test(x = female.alone,
y = female.company,
alternative = "two.sided" 
)
q4.test

## 
##  Welch Two Sample t-test
## 
## data:  female.alone and female.company
## t = 1.3791, df = 53.466, p-value = 0.1736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1821801  0.9844944
## sample estimates:
## mean of x mean of y 
##  3.754098  3.352941

results: t(53.466)= 1.3791,p-value = 0.1736,

It seems like there is no significant difference in relation to number of drinks between women who leave the club in company and women who leave the club alone.

Q3

In a later chapter, we’ll learn how to write custom functions that make a lot of your programming life much easier. For example, you can write a custom function that takes a t.test object as an input, and spits out an APA style conclusion as an output! In this question, we’ll create a function called ‘apa’ that does just that:

First, load the function into R. You can do this in one of two ways. Either re-download the yarrr package (I uploaded the package online earlier today) or execute all the code in the following chunk.

apa <- function(test.object, tails = 2, sig.digits = 2, p.lb = .01) {

  statistic.id <- substr(names(test.object$statistic), start = 1, stop = 1)
  p.value <- test.object$p.value

  if(tails == 1) {p.value <- p.value / 2}

  if (p.value < p.lb) {p.display <- paste("p < ", p.lb, " (", tails, "-tailed)", sep = "")}
  if (p.value > p.lb) {p.display <- paste("p = ", round(p.value, sig.digits), " (", tails, "-tailed)", sep = "")}


  add.par <- ""

  if(grepl("product-moment", test.object$method)) {

    estimate.display <- paste("r = ", round(test.object$estimate, sig.digits), ", ", sep = "")

  }

  if(grepl("Chi", test.object$method)) {

    estimate.display <- ""

    add.par <- paste(", N = ", sum(test.object$observed), sep = "")

  }

  if(grepl("One Sample t-test", test.object$method)) {

    estimate.display <- paste("mean = ", round(test.object$estimate, sig.digits), ", ", sep = "")

  }

  if(grepl("Two Sample t-test", test.object$method)) {

    estimate.display <- paste("mean difference = ", round(test.object$estimate[2] - test.object$estimate[1], sig.digits), ", ", sep = "")

  }




  return(paste(
    estimate.display,
    statistic.id,
    "(",
               round(test.object$parameter, sig.digits),
               add.par,
               ") = ",
               round(test.object$statistic, sig.digits),
               ", ",
               p.display,
               sep = ""
  )
  )

}

Now, try the function on your previous test results from Q1 and Q2 by executing the following two lines of code.

apa(q1.test)

## [1] "mean difference = -2.31, t(297.55) = 0.38, p = 0.7 (2-tailed)"

apa(q2.test)

## [1] "mean difference = 0.66, t(104.1) = -0.06, p = 0.95 (2-tailed)"

```

You can also embed plots, for example:

```{r, echo=FA

apa(q1.test)

## [1] "mean difference = -2.31, t(297.55) = 0.38, p = 0.7 (2-tailed)"

```

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.