Module 9: Simple inference tests in R

Question 1

Make up a vector of 50 random Legolas actors, with mean height of 195cm, and a standard deviation of 15cm. Run a t-test to compare this sample of actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

legolas = rnorm(50, mean=195, sd=15)
aragorn = rnorm(50, mean=180, sd=10)
t.test(legolas, aragorn, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and aragorn
## t = 6.9926, df = 92.552, p-value = 4.124e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.93180 23.19077
## sample estimates:
## mean of x mean of y 
##  198.6301  180.5688

legolas = rnorm(50, mean=195, sd=15)
gimli = rnorm(50, mean=132, sd=15)
t.test(legolas, gimli, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and gimli
## t = 21.462, df = 96.825, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  56.93278 68.53588
## sample estimates:
## mean of x mean of y 
##  196.2789  133.5446

There is evidence for significant differences since when comparing the Legolas actors to the Gimlis, the p-value is low enough that we can reject our null hypothesis with high confidence but that cannot be said when comparing the Legolas to the Aragorns.

Question 2

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

var.test(legolas, gimli)

## 
##  F test to compare two variances
## 
## data:  legolas and gimli
## F = 0.80155, num df = 49, denom df = 49, p-value = 0.4416
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4548595 1.4124791
## sample estimates:
## ratio of variances 
##          0.8015482

Here, the p is high so that would suggest there is no significant difference in variance.

Question 3

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

iris = read.csv("iris.csv")
iris$Species = factor(iris$Species)

setosa = iris$Species == "setosa"
irisS = iris[setosa, ]
cor.test(irisS$Sepal.Length, irisS$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  irisS$Sepal.Length and irisS$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

versicolor = iris$Species == "versicolor"
irisVe = iris[versicolor, ]
cor.test(irisVe$Sepal.Length, irisVe$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  irisVe$Sepal.Length and irisVe$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

virginica = iris$Species == "virginica"
irisVi = iris[virginica, ]
cor.test(irisVi$Sepal.Length, irisVi$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  irisVi$Sepal.Length and irisVi$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

There is very little to no correlation between these variables

Question 4

Using the deer dataset and the chisq.test() function, test: - If there are significant differences in the number of deer caught per month - If the cases of tuberculosis are uniformly distributed across all farms

deer = read.csv("Deer.csv")
table(deer$Month)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

table(deer$Tb, deer$Farm)

##    
##      AL  AU  BA  BE  CB CRC  HB LCV  LN MAN  MB  MO  NC  NV  PA  PN  QM  RF  RN
##   0  10  23  67   7  88   4  22   0  28  27  16 186  24  18  11  39  67  23  21
##   1   3   0   5   0   3   0   1   1   6  24   5  31   4   1   0   0   7   1   0
##    
##      RO SAL SAU  SE  TI  TN VISO  VY
##   0  31   0   3  16   9  16   13  15
##   1   0   1   0  10   0   2    1   4

chisq.test(table(deer$Tb, deer$Farm))

## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15

There is a significance in the number of deer caught per month but there is not a significance in TB cases distributed across all farms

Module 9: Simple inference tests in R

Katya Podkovyroff Lewis

2022-06-21

Question 1

Question 2

Question 3

Question 4