This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
club.df <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/12/club.txt",
sep = "\t",
header = T,
stringsAsFactors = F)
boxplot(formula = time ~ gender,
data = club.df,
xlab = "gender",
ylab = "time",
main = "Club times for males and females")
aggregate(formula = time ~ gender,
FUN = mean,
na.rm = T,
data = club.df
)
## gender time
## 1 F 134.4167
## 2 M 136.7292
men.time <- subset(club.df, subset = gender == "M")$time
female.time <- subset(club.df, subset = gender == "F")$time
q1.test <- t.test(x = men.time,
y = female.time,
alternative = "two.sided"
)
q1.test
##
## Welch Two Sample t-test
##
## data: men.time and female.time
## t = 0.38152, df = 297.55, p-value = 0.7031
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.615836 14.240836
## sample estimates:
## mean of x mean of y
## 136.7292 134.4167
male.time.bleche <- subset(club.df, gender == "M" & club == "Blechnerei" )$time
female.time.bleche<- subset(club.df, gender == "F" & club == "Blechnerei" )$time
q2.test <- t.test(x = male.time.bleche,
y = female.time.bleche,
alternative = "two.sided"
)
q2.test
##
## Welch Two Sample t-test
##
## data: male.time.bleche and female.time.bleche
## t = -0.062752, df = 104.1, p-value = 0.9501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.61866 20.29240
## sample estimates:
## mean of x mean of y
## 140.2549 140.9180
boxplot(formula = drinks ~ leavealone,
data = club.df,
xlab = "leavealone",
ylab = "drinks",
main = "drinks for loners or one-night-stands")
####b) Using grouped aggregation (e.g.; aggregate or dplyr), calculate the mean number of drinks people people had when they went home alone or not alone.
```{
drink.alone <- subset(club.df, leavealone == 1)$drinks
drink.company <- subset(club.df, leavealone == 0)$drinks
mean(drink.alone)
## [1] 4.117904
mean (drink.company)
## [1] 3.577465
###now with aggreagate:
aggregate(formula = drinks~ leavealone,
FUN = mean,
na.rm = T,
data = club.df
)
## leavealone drinks
## 1 0 3.577465
## 2 1 4.117904
q3.test <- t.test(x = drink.alone,
y = drink.company,
alternative = "two.sided"
)
q3.test
##
## Welch Two Sample t-test
##
## data: drink.alone and drink.company
## t = 2.6253, df = 121.18, p-value = 0.009772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1328990 0.9479793
## sample estimates:
## mean of x mean of y
## 4.117904 3.577465
female.alone <- subset(club.df, leavealone == "1" & gender == "F")$drinks
female.company <- subset(club.df, leavealone == "0" & gender == "F")$drinks
q4.test <- t.test(x = female.alone,
y = female.company,
alternative = "two.sided"
)
q4.test
##
## Welch Two Sample t-test
##
## data: female.alone and female.company
## t = 1.3791, df = 53.466, p-value = 0.1736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1821801 0.9844944
## sample estimates:
## mean of x mean of y
## 3.754098 3.352941
It seems like there is no significant difference in relation to number of drinks between women who leave the club in company and women who leave the club alone.
In a later chapter, we’ll learn how to write custom functions that make a lot of your programming life much easier. For example, you can write a custom function that takes a t.test object as an input, and spits out an APA style conclusion as an output! In this question, we’ll create a function called ‘apa’ that does just that:
apa <- function(test.object, tails = 2, sig.digits = 2, p.lb = .01) {
statistic.id <- substr(names(test.object$statistic), start = 1, stop = 1)
p.value <- test.object$p.value
if(tails == 1) {p.value <- p.value / 2}
if (p.value < p.lb) {p.display <- paste("p < ", p.lb, " (", tails, "-tailed)", sep = "")}
if (p.value > p.lb) {p.display <- paste("p = ", round(p.value, sig.digits), " (", tails, "-tailed)", sep = "")}
add.par <- ""
if(grepl("product-moment", test.object$method)) {
estimate.display <- paste("r = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Chi", test.object$method)) {
estimate.display <- ""
add.par <- paste(", N = ", sum(test.object$observed), sep = "")
}
if(grepl("One Sample t-test", test.object$method)) {
estimate.display <- paste("mean = ", round(test.object$estimate, sig.digits), ", ", sep = "")
}
if(grepl("Two Sample t-test", test.object$method)) {
estimate.display <- paste("mean difference = ", round(test.object$estimate[2] - test.object$estimate[1], sig.digits), ", ", sep = "")
}
return(paste(
estimate.display,
statistic.id,
"(",
round(test.object$parameter, sig.digits),
add.par,
") = ",
round(test.object$statistic, sig.digits),
", ",
p.display,
sep = ""
)
)
}
apa(q1.test)
## [1] "mean difference = -2.31, t(297.55) = 0.38, p = 0.7 (2-tailed)"
apa(q2.test)
## [1] "mean difference = 0.66, t(104.1) = -0.06, p = 0.95 (2-tailed)"
```
You can also embed plots, for example:
```{r, echo=FA
apa(q1.test)
## [1] "mean difference = -2.31, t(297.55) = 0.38, p = 0.7 (2-tailed)"
```
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.