Module 9 Assignment

Author

MJ Kemp

Module 9 Assignment

Read in and prepare data

Deer = read.csv("Deer.csv")
Iris = read.csv("iris.csv")
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, 195, 15)

Run t-test to compare the Legolas actors to Aragorns, and then the set of Gimlis

t.test(legolas, aragorn, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 6.7788, df = 76.866, p-value = 2.188e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 11.61540 21.27793
sample estimates:
mean of x mean of y 
 194.8305  178.3838

t.test(legolas, gimli, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 21.348, df = 97.842, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 56.87216 68.52935
sample estimates:
mean of x mean of y 
 194.8305  132.1297

Is there evidence of significant differences? As the p-value is below 0.05 for both instances, there is a significant difference in the height of these groups of actors.

Re-run variance test (F-test) to compare the group of Gimli and Legolas actors

var.test(gimli,legolas)


    F test to compare two variances

data:  gimli and legolas
F = 0.92275, num df = 49, denom df = 49, p-value = 0.7795
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5236394 1.6260617
sample estimates:
ratio of variances 
         0.9227513

Do these groups have different variance? As the ratio is close 1 and p-value greater than 0.05, there is no significant variance.

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species

cor(iris$Sepal.Length, iris$Sepal.Width)

[1] -0.1175698

cor.test(iris$Sepal.Length, iris$Sepal.Width)


    Pearson's product-moment correlation

data:  iris$Sepal.Length and iris$Sepal.Width
t = -1.4403, df = 148, p-value = 0.1519
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.27269325  0.04351158
sample estimates:
       cor 
-0.1175698

iris |> 
  group_by(Species) |> 
  summarise(avgSep.Length = mean(iris$Sepal.Length), avg.Sep.Width = mean(iris$Sepal.Width), correlation = cor(iris$Sepal.Length, iris$Sepal.Width))

# A tibble: 3 × 4
  Species    avgSep.Length avg.Sep.Width correlation
  <fct>              <dbl>         <dbl>       <dbl>
1 setosa              5.84          3.06      -0.118
2 versicolor          5.84          3.06      -0.118
3 virginica           5.84          3.06      -0.118

Are these correlated? With a correlation of -0.118 for each species, this means the variables Sepal.Lenghth and Sepal.Width may have a slight inverse relationship. However, based on the p-value of 0.1519, it is not significant.

Test for significant differences in number of deer caught per month (using chisq.test())

table(Deer$Month)


  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(Deer$Month))


    Chi-squared test for given probabilities

data:  table(Deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

Is this significant? Yes, as the p-value is much smaller than the threshold of 0.05.

Determine if the cases of tuberculosis are uniformly distributed across all farms (using chisq.test())

table(Deer$Farm, Deer$Tb)

      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4

chisq.test(table(Deer$Farm, Deer$Tb))

Warning in chisq.test(table(Deer$Farm, Deer$Tb)): Chi-squared approximation may
be incorrect


    Pearson's Chi-squared test

data:  table(Deer$Farm, Deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

Is the difference in distribuition significant? Again, yes, this is significant as the p-value is much smaller than the threshold of 0.05.