Module09_Gordon

Author

Mason Gordon

Module 09

Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, 195, 15)
t.test(legolas, aragorn, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 6.5541, df = 88.503, p-value = 3.604e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 11.53917 21.58073
sample estimates:
mean of x mean of y 
 197.4558  180.8958

t.test(legolas, gimli, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 21.503, df = 97.99, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 57.11855 68.73322
sample estimates:
mean of x mean of y 
 197.4558  134.5299

Both p-values are below 0.05, therefore we have found a significant difference in height between the actor groups

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

var.test(gimli, legolas)


    F test to compare two variances

data:  gimli and legolas
F = 1.0209, num df = 49, denom df = 49, p-value = 0.9426
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5793374 1.7990213
sample estimates:
ratio of variances 
          1.020902

The p-value above 0.05 indicates that is no significant difference in variance

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

iris <- read.csv("../data/iris.csv")
library(dplyr)

Warning: package 'dplyr' was built under R version 4.4.2


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

iris %>%
  group_by(Species) %>%
  summarize(
    correlation = cor(Sepal.Length, Sepal.Width),
    p_value = cor.test(Sepal.Length, Sepal.Width)$p.value
  )

# A tibble: 3 × 3
  Species    correlation  p_value
  <chr>            <dbl>    <dbl>
1 setosa           0.743 6.71e-10
2 versicolor       0.526 8.77e- 5
3 virginica        0.457 8.43e- 4

These low p-values show significant correlation

Using the deer dataset and the chisq.test() function, test:
- If there are significant differences in the number of deer caught per month

deer = read.csv("../data/Deer.csv")
table(deer$Month)


  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))


    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

The low p-value indicates a significant difference in the number of deer caught per month

If the cases of tuberculosis are uniformly distributed across all farms

table(deer$Farm, deer$Tb)

      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4

chisq.test(table(deer$Farm, deer$Tb))

Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect


    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

The small p-value indicated that tuberculosis cases are not uniformly distributed, but rather some are significantly higher or lower than others