Make up a vector of 50 random Legolas actors, with mean height of 195cm, and a standard deviation of 15cm. Run a t-test to compare this sample of actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?
legolas = rnorm(50, mean=195, sd=15)
aragorn = rnorm(50, mean=180, sd=10)
t.test(legolas, aragorn, alternative="two.sided")
##
## Welch Two Sample t-test
##
## data: legolas and aragorn
## t = 6.9926, df = 92.552, p-value = 4.124e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.93180 23.19077
## sample estimates:
## mean of x mean of y
## 198.6301 180.5688
legolas = rnorm(50, mean=195, sd=15)
gimli = rnorm(50, mean=132, sd=15)
t.test(legolas, gimli, alternative="two.sided")
##
## Welch Two Sample t-test
##
## data: legolas and gimli
## t = 21.462, df = 96.825, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 56.93278 68.53588
## sample estimates:
## mean of x mean of y
## 196.2789 133.5446
There is evidence for significant differences since when comparing the Legolas actors to the Gimlis, the p-value is low enough that we can reject our null hypothesis with high confidence but that cannot be said when comparing the Legolas to the Aragorns.
Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?
var.test(legolas, gimli)
##
## F test to compare two variances
##
## data: legolas and gimli
## F = 0.80155, num df = 49, denom df = 49, p-value = 0.4416
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4548595 1.4124791
## sample estimates:
## ratio of variances
## 0.8015482
Here, the p is high so that would suggest there is no significant difference in variance.
Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?
iris = read.csv("iris.csv")
iris$Species = factor(iris$Species)
setosa = iris$Species == "setosa"
irisS = iris[setosa, ]
cor.test(irisS$Sepal.Length, irisS$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: irisS$Sepal.Length and irisS$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5851391 0.8460314
## sample estimates:
## cor
## 0.7425467
versicolor = iris$Species == "versicolor"
irisVe = iris[versicolor, ]
cor.test(irisVe$Sepal.Length, irisVe$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: irisVe$Sepal.Length and irisVe$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2900175 0.7015599
## sample estimates:
## cor
## 0.5259107
virginica = iris$Species == "virginica"
irisVi = iris[virginica, ]
cor.test(irisVi$Sepal.Length, irisVi$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: irisVi$Sepal.Length and irisVi$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049657 0.6525292
## sample estimates:
## cor
## 0.4572278
There is very little to no correlation between these variables
Using the deer dataset and the chisq.test() function, test: - If there are significant differences in the number of deer caught per month - If the cases of tuberculosis are uniformly distributed across all farms
deer = read.csv("Deer.csv")
table(deer$Month)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 256 165 27 3 2 35 11 19 58 168 189 188
chisq.test(table(deer$Month))
##
## Chi-squared test for given probabilities
##
## data: table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16
table(deer$Tb, deer$Farm)
##
## AL AU BA BE CB CRC HB LCV LN MAN MB MO NC NV PA PN QM RF RN
## 0 10 23 67 7 88 4 22 0 28 27 16 186 24 18 11 39 67 23 21
## 1 3 0 5 0 3 0 1 1 6 24 5 31 4 1 0 0 7 1 0
##
## RO SAL SAU SE TI TN VISO VY
## 0 31 0 3 16 9 16 13 15
## 1 0 1 0 10 0 2 1 4
chisq.test(table(deer$Tb, deer$Farm))
## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15
There is a significance in the number of deer caught per month but there is not a significance in TB cases distributed across all farms