Module09_Gordon

Author

Mason Gordon

Module 09

  • Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, 195, 15)
t.test(legolas, aragorn, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and aragorn
t = 6.5541, df = 88.503, p-value = 3.604e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 11.53917 21.58073
sample estimates:
mean of x mean of y 
 197.4558  180.8958 
t.test(legolas, gimli, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and gimli
t = 21.503, df = 97.99, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 57.11855 68.73322
sample estimates:
mean of x mean of y 
 197.4558  134.5299 

Both p-values are below 0.05, therefore we have found a significant difference in height between the actor groups

  • Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?
var.test(gimli, legolas)

    F test to compare two variances

data:  gimli and legolas
F = 1.0209, num df = 49, denom df = 49, p-value = 0.9426
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5793374 1.7990213
sample estimates:
ratio of variances 
          1.020902 

The p-value above 0.05 indicates that is no significant difference in variance

  • Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?
iris <- read.csv("../data/iris.csv")
library(dplyr)
Warning: package 'dplyr' was built under R version 4.4.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
iris %>%
  group_by(Species) %>%
  summarize(
    correlation = cor(Sepal.Length, Sepal.Width),
    p_value = cor.test(Sepal.Length, Sepal.Width)$p.value
  )
# A tibble: 3 × 3
  Species    correlation  p_value
  <chr>            <dbl>    <dbl>
1 setosa           0.743 6.71e-10
2 versicolor       0.526 8.77e- 5
3 virginica        0.457 8.43e- 4

These low p-values show significant correlation

  • Using the deer dataset and the chisq.test() function, test:
    • If there are significant differences in the number of deer caught per month
deer = read.csv("../data/Deer.csv")
table(deer$Month)

  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188 
chisq.test(table(deer$Month))

    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

The low p-value indicates a significant difference in the number of deer caught per month

  • If the cases of tuberculosis are uniformly distributed across all farms
table(deer$Farm, deer$Tb)
      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4
chisq.test(table(deer$Farm, deer$Tb))
Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect

    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

The small p-value indicated that tuberculosis cases are not uniformly distributed, but rather some are significantly higher or lower than others