Wulff Module 09

Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

Null H0: The two groups of actors have the same height

Alternative Ha: The two groups of actors have different heights

legolas = rnorm(50, 195, 15)
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)

t.test(legolas, aragorn, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and aragorn
## t = 4.5041, df = 94.139, p-value = 1.911e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.719479 17.313748
## sample estimates:
## mean of x mean of y 
##  194.1439  182.1273

t.test(legolas, gimli, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and gimli
## t = 20.398, df = 97.775, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  55.21154 67.11275
## sample estimates:
## mean of x mean of y 
##  194.1439  132.9818

In both of these tests, p=0, which shows a significant difference between groups and means that we can reject our null hypothesis with high confidence.

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

var.test(gimli,legolas)

## 
##  F test to compare two variances
## 
## data:  gimli and legolas
## F = 1.1009, num df = 49, denom df = 49, p-value = 0.7379
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6247186 1.9399438
## sample estimates:
## ratio of variances 
##           1.100872

In this test, the value of p is above 0.05, (=0.9131), suggesting that there is no significant difference in variance.

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

test_setosa <- cor.test(iris[iris$Species == "setosa", "Sepal.Length"], 
   iris[iris$Species == "setosa", "Sepal.Width"])
print(test_setosa)

## 
##  Pearson's product-moment correlation
## 
## data:  iris[iris$Species == "setosa", "Sepal.Length"] and iris[iris$Species == "setosa", "Sepal.Width"]
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

test_versicolor <- cor.test(iris[iris$Species == "versicolor", "Sepal.Length"], 
   iris[iris$Species == "versicolor", "Sepal.Width"])
print(test_versicolor)

## 
##  Pearson's product-moment correlation
## 
## data:  iris[iris$Species == "versicolor", "Sepal.Length"] and iris[iris$Species == "versicolor", "Sepal.Width"]
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

test_virginica <- cor.test(iris[iris$Species == "virginica", "Sepal.Length"], 
   iris[iris$Species == "virginica", "Sepal.Width"])
print(test_virginica)

## 
##  Pearson's product-moment correlation
## 
## data:  iris[iris$Species == "virginica", "Sepal.Length"] and iris[iris$Species == "virginica", "Sepal.Width"]
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

All three of these p-values are less than 0.05, indicating significant correlations. The species “Setosa” has a srong positive correlation of 0.7425467, while the species “Versicolor” has a moderate correlation of 0.5259107, and the species “Virginica” has a moderate correlation of 0.4572278.

Using the deer dataset and the chisq.test() function, test:

If there are significant differences in the number of deer caught per month.

deer <- read.csv("deer.csv")
table(deer$Month)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

The p-value is zero, meaning that this relationship is significant.

If the cases of tuberculosis are uniformly distributed across all farms.

table(deer$Tb, deer$Farm)

##    
##      AL  AU  BA  BE  CB CRC  HB LCV  LN MAN  MB  MO  NC  NV  PA  PN  QM  RF  RN
##   0  10  23  67   7  88   4  22   0  28  27  16 186  24  18  11  39  67  23  21
##   1   3   0   5   0   3   0   1   1   6  24   5  31   4   1   0   0   7   1   0
##    
##      RO SAL SAU  SE  TI  TN VISO  VY
##   0  31   0   3  16   9  16   13  15
##   1   0   1   0  10   0   2    1   4

chisq.test(table(deer$Tb, deer$Farm))

## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15