title: ‘WPA #6’ author: “Rebekka Herz” date: “11. Juni 2015” output: html_document

Download the dataframe pirate_survey_noerrors.txt from http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt. The data are stored in a tab-separated text file with headers. Load the dataframe into an object called pirates. Because it’s tab-separated, use sep = “”.

pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt", sep = "\t", header = T, stringsAsFactors = F)

Conduct a one-sample t-test to test whether or not the mean age of pirates is significantly different from 25. What is the test statistic, p-value, and 95% confidence interval? (Note: access these directly from the object, don’t type them manually). What is your conclusion?

test.result <- t.test(x = pirates$age, 
                      mu = 25, 
                      alternative = "t"
                      )
test.result

## 
##  One Sample t-test
## 
## data:  pirates$age
## t = 14.5068, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
##  27.28634 28.00166
## sample estimates:
## mean of x 
##    27.644

test.result$statistic

##       t 
## 14.5068

test.result$p.value

## [1] 2.044261e-43

test.result$conf.int

## [1] 27.28634 28.00166
## attr(,"conf.level")
## [1] 0.95

#conclusion: the p-value is below 0.05 so we can conclude that there the mean age of pirates is significantly different from 25 (mean age: 27.644)

Conduct a one-sample t-test to test whether or not the mean number of parrots owned by pirates is different from 2.7. What is the test statistic, p-value, and 95% confidence interval? (Note: access these directly from the object, don’t type them manually). Write the conclusion using APA style.

test.result <- t.test(x = pirates$parrots.lifetime, 
                      mu = 2.7, 
                      alternative = "t"
                      )
test.result

## 
##  One Sample t-test
## 
## data:  pirates$parrots.lifetime
## t = 0.7298, df = 999, p-value = 0.4657
## alternative hypothesis: true mean is not equal to 2.7
## 95 percent confidence interval:
##  2.586837 2.947163
## sample estimates:
## mean of x 
##     2.767

Conclusion: t(999) = 0.73, p = 0.47, 95% CI: [2.59, 2.95] -> The mean number of parrots is not statistically different from 2.7.

A pirate from Captain Chunk’s Canon Crew (CCCC) claims that pirates from his college have faster sword speeds than pirates from Jack Sparrow’s School of Fashion and Piratry (JSSFP). Test this claim by conducting the appropriate (one-tailed!) two-sample test and report the result using APA format.

swords.CCCC <- subset(pirates, subset = college == "CCCC")$sword.speed
swords.JSSFP <- subset(pirates, subset = college == "JSSFP")$sword.speed

test.result <- t.test(x = swords.CCCC,
                      y = swords.JSSFP,
                      alternative = "g"
                      )
test.result

## 
##  Welch Two Sample t-test
## 
## data:  swords.CCCC and swords.JSSFP
## t = -1.4524, df = 540.345, p-value = 0.9265
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.5732584        Inf
## sample estimates:
## mean of x mean of y 
##  1.068027  1.336603

Conclusion: t(540.35) = -1.45, p = 0.93, 95% CI: [-0.57, Inf] -> Pirates who went to CCCC do not have a significantly faster sword speed than pirates who went to JFSSP.

5.According to a recent blog post on Piratebook, pirates whose favorite pirate is Blackbeard have more tattoos than pirates whose favorite pirate is Jack Sparrow. Test this claim by conducting the appropriate test and reporting the result in APA format. Important! Do this test once using the t.test(x, y) notation, and once using the t.test(formula, data) notation.

tattoos.Blackbeard <- subset(pirates, subset = favorite.pirate == "Blackbeard")$tattoos

tattoos.JackSparrow <- subset(pirates, subset = favorite.pirate == "Jack Sparrow")$tattoos

test.result <- t.test(x = tattoos.Blackbeard,
                      y = tattoos.JackSparrow,
                      alternative = "g"
                      )
test.result

## 
##  Welch Two Sample t-test
## 
## data:  tattoos.Blackbeard and tattoos.JackSparrow
## t = 0.0333, df = 137.3, p-value = 0.4867
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.6302564        Inf
## sample estimates:
## mean of x mean of y 
##  9.620000  9.607064

test.result2 <- t.test(formula = tattoos ~ favorite.pirate, 
                      subset = favorite.pirate %in% c("Blackbeard", "Jack Sparrow"), 
                      data = pirates, 
                      alternative = "g"
                      )

test.result2

## 
##  Welch Two Sample t-test
## 
## data:  tattoos by favorite.pirate
## t = 0.0333, df = 137.3, p-value = 0.4867
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.6302564        Inf
## sample estimates:
##   mean in group Blackbeard mean in group Jack Sparrow 
##                   9.620000                   9.607064

Conclusion: t(137.3) = 0.03, p = 0.93, 95% CI: [-0.63, Inf] -> Pirates whose favorite Pirate is Blackbeard do not have significantly more tattoos than pirates who like Jack Sparrow best.

Is there a relationship between a pirate’s age and the number of treasure chests he/she’s found? Test this by conducting the appropriate test and report your results in APA format.

test.result <- cor.test(x = pirates$age,
                        y = pirates$tchests.found,
                        )
test.result

## 
##  Pearson's product-moment correlation
## 
## data:  pirates$age and pirates$tchests.found
## t = 2.8263, df = 998, p-value = 0.004802
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02726784 0.15027323
## sample estimates:
##        cor 
## 0.08911029

Conclusion: t(998) = 2.83, p > 0.01, 95% CI: [0.03, 0.15] -> There is a significant correlation between pirates’ age and treasure chests found.

Repeat the previous test just for pirates who have owned less than 10 parrots and whose favorite pirate is Jack Sparrow. Report your results in APA format.

parrots.JackSparrow <- subset(pirates, subset = favorite.pirate == "Jack Sparrow" &
                                parrots.lifetime < 10)


test.result <- cor.test(x = parrots.JackSparrow$age,
                        y = parrots.JackSparrow$tchests.found
                        )
test.result

## 
##  Pearson's product-moment correlation
## 
## data:  parrots.JackSparrow$age and parrots.JackSparrow$tchests.found
## t = 1.88, df = 437, p-value = 0.06077
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.004053305  0.181639084
## sample estimates:
##        cor 
## 0.08957122

Conclusion: t(437) = 1,88, p = 0.06, 95% CI: [-0.00, 0.18] -> There is no significant correlation between pirates’ age whose favorite pirate is “Jack Sparrow”, who have owned less than 10 parrots and treasure chests found.

8 Is there a relationship between the college a pirate went to and his favorite pirate? Test this by conducting the appropirate test and report your results in APA format

test.result <- with(pirates,
                    chisq.test(x = college,
                               y = favorite.pirate)
                    )
test.result

## 
##  Pearson's Chi-squared test
## 
## data:  college and favorite.pirate
## X-squared = 44.5956, df = 5, p-value = 1.753e-08

Conclusion: t(5) = 44.60, p < 0.01 There is a significant relationship between the college a pirate went to and his favorite pirate.

Using the results of your test in question 8 (specifically the observed frequency table), create a new table showing the proportion of pirates in each school that like each pirate. Your final table should look like this:

test.result <- with(pirates,
                    chisq.test(x = college,
                               y = favorite.pirate)
                    )


observedtable <- test.result$observed
sum(observedtable [1, 1:6])

## [1] 637

(observedtable [2, 1:6])

##     Anicetus   Blackbeard   Edward Low         Hook Jack Sparrow 
##           53           46           50           49          115 
##   Lewis Scot 
##           50

x <- (observedtable [1, 1:6])/sum(observedtable [1, 1:6])
y <- (observedtable [2, 1:6])/sum(observedtable [2, 1:6])

x

##     Anicetus   Blackbeard   Edward Low         Hook Jack Sparrow 
##   0.10518053   0.08477237   0.10047096   0.10361068   0.53061224 
##   Lewis Scot 
##   0.07535322

##     Anicetus   Blackbeard   Edward Low         Hook Jack Sparrow 
##    0.1460055    0.1267218    0.1377410    0.1349862    0.3168044 
##   Lewis Scot 
##    0.1377410

10.Create the following two histograms of the age of pirates - one for pirates who have found less than 5 treasure chests and one for those who have found 5 or more treasure chests. For each histogram, add low-level plotting elements (e.g.; points and lines) showing the same mean and 95% CI for the mean age of pirates in that group (Hint: conduct the appropriate one-sample t-test, access the 95% CI from the test, then add the low-level plotting elements with points() and segments()).

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.