Module 9 Code and Answers

Author

u1535008

Question 1

Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

Here are the random distributions for each of these actor groups.

aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)

Now we will run the first t-test between Legolas and Aragorn actors.

t.test(legolas, aragorn, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and aragorn
t = 5.5883, df = 81.67, p-value = 2.941e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  9.96490 20.98215
sample estimates:
mean of x mean of y 
 196.0648  180.5913 

Because the p-value is less than 0.05, this means that there is a significant difference between the two types of actors.

Next we will compare the Legolas actors with the Gimli actors.

t.test(legolas, gimli, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and gimli
t = 21.57, df = 91.126, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 57.77774 69.49858
sample estimates:
mean of x mean of y 
 196.0648  132.4267 

The p-value here is even smaller than in the previous test, much smaller than 0.05, which means that there is a significant difference between the two types of actors.

Question 2

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

var.test(gimli, legolas)

    F test to compare two variances

data:  gimli and legolas
F = 0.56906, num df = 49, denom df = 49, p-value = 0.05112
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3229254 1.0027829
sample estimates:
ratio of variances 
         0.5690554 

These two groups do not seem to have different variance since the p-value is greater than 0.05.

Question 3

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset but for the three individual species. Are these correlated?

Reading in the dataset:

iris <- read.csv("iris.csv")
table(iris$Species)

    setosa versicolor  virginica 
        50         50         50 

First, the setosa species

iris_set <- subset(iris, Species == "setosa")
cor.test(iris_set$Sepal.Length, iris_set$Sepal.Width)

    Pearson's product-moment correlation

data:  iris_set$Sepal.Length and iris_set$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467 

Next, the versicolor species

iris_vers <- subset(iris, Species == "versicolor")
cor.test(iris_vers$Sepal.Length, iris_vers$Sepal.Width)

    Pearson's product-moment correlation

data:  iris_vers$Sepal.Length and iris_vers$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107 

Finally, the virginica species

iris_vir <- subset(iris, Species == "virginica")
cor.test(iris_vir$Sepal.Length, iris_vir$Sepal.Width)

    Pearson's product-moment correlation

data:  iris_vir$Sepal.Length and iris_vir$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278 

There does seem to be a difference in correlation when the different species are considered. The setosa iris has the highest correlation of the three species 0.7425467. The versicolor iris has a correlation of 0.5259107. The virginica iris has the lowest correlation of the three species 0.4572278.

Question 4

Using the deer dataset and the chisq.test() function, test (1) if there are significant differences in the number of deer caught per month and (2) if the cases of tuberculosis are uniformly distributed across all farms.

Part 1

deer <- read.csv("Deer.csv")
table(deer$Month)

  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188 
chisq.test(table(deer$Month))

    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

Because the p-value is much less than 0.05, there is a significant difference in the number of deer caught per month.

Part 2

table(deer$Farm, deer$Tb)
      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4
chisq.test(table(deer$Farm, deer$Tb))
Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect

    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

Because the p-value is much less than 0.05, there is a significant difference in the cases of tuberculosis at each farm, meaning that they are not evenly distributed.