aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)Module 9 Code and Answers
Question 1
Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?
Here are the random distributions for each of these actor groups.
Now we will run the first t-test between Legolas and Aragorn actors.
t.test(legolas, aragorn, alternative = "two.sided")
Welch Two Sample t-test
data: legolas and aragorn
t = 5.5883, df = 81.67, p-value = 2.941e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
9.96490 20.98215
sample estimates:
mean of x mean of y
196.0648 180.5913
Because the p-value is less than 0.05, this means that there is a significant difference between the two types of actors.
Next we will compare the Legolas actors with the Gimli actors.
t.test(legolas, gimli, alternative = "two.sided")
Welch Two Sample t-test
data: legolas and gimli
t = 21.57, df = 91.126, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
57.77774 69.49858
sample estimates:
mean of x mean of y
196.0648 132.4267
The p-value here is even smaller than in the previous test, much smaller than 0.05, which means that there is a significant difference between the two types of actors.
Question 2
Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?
var.test(gimli, legolas)
F test to compare two variances
data: gimli and legolas
F = 0.56906, num df = 49, denom df = 49, p-value = 0.05112
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3229254 1.0027829
sample estimates:
ratio of variances
0.5690554
These two groups do not seem to have different variance since the p-value is greater than 0.05.
Question 3
Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset but for the three individual species. Are these correlated?
Reading in the dataset:
iris <- read.csv("iris.csv")
table(iris$Species)
setosa versicolor virginica
50 50 50
First, the setosa species
iris_set <- subset(iris, Species == "setosa")
cor.test(iris_set$Sepal.Length, iris_set$Sepal.Width)
Pearson's product-moment correlation
data: iris_set$Sepal.Length and iris_set$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5851391 0.8460314
sample estimates:
cor
0.7425467
Next, the versicolor species
iris_vers <- subset(iris, Species == "versicolor")
cor.test(iris_vers$Sepal.Length, iris_vers$Sepal.Width)
Pearson's product-moment correlation
data: iris_vers$Sepal.Length and iris_vers$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2900175 0.7015599
sample estimates:
cor
0.5259107
Finally, the virginica species
iris_vir <- subset(iris, Species == "virginica")
cor.test(iris_vir$Sepal.Length, iris_vir$Sepal.Width)
Pearson's product-moment correlation
data: iris_vir$Sepal.Length and iris_vir$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2049657 0.6525292
sample estimates:
cor
0.4572278
There does seem to be a difference in correlation when the different species are considered. The setosa iris has the highest correlation of the three species 0.7425467. The versicolor iris has a correlation of 0.5259107. The virginica iris has the lowest correlation of the three species 0.4572278.
Question 4
Using the deer dataset and the chisq.test() function, test (1) if there are significant differences in the number of deer caught per month and (2) if the cases of tuberculosis are uniformly distributed across all farms.
Part 1
deer <- read.csv("Deer.csv")
table(deer$Month)
1 2 3 4 5 6 7 8 9 10 11 12
256 165 27 3 2 35 11 19 58 168 189 188
chisq.test(table(deer$Month))
Chi-squared test for given probabilities
data: table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16
Because the p-value is much less than 0.05, there is a significant difference in the number of deer caught per month.
Part 2
table(deer$Farm, deer$Tb)
0 1
AL 10 3
AU 23 0
BA 67 5
BE 7 0
CB 88 3
CRC 4 0
HB 22 1
LCV 0 1
LN 28 6
MAN 27 24
MB 16 5
MO 186 31
NC 24 4
NV 18 1
PA 11 0
PN 39 0
QM 67 7
RF 23 1
RN 21 0
RO 31 0
SAL 0 1
SAU 3 0
SE 16 10
TI 9 0
TN 16 2
VISO 13 1
VY 15 4
chisq.test(table(deer$Farm, deer$Tb))Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect
Pearson's Chi-squared test
data: table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15
Because the p-value is much less than 0.05, there is a significant difference in the cases of tuberculosis at each farm, meaning that they are not evenly distributed.