Module 09 - Simple inference tests in R

Exercises

Make up a vector of 50 random Legolas actors, with mean height of 195cm, and a standard deviation of 15cm. Run a t-test to compare this sample of actors to the set of Aragorns and then the set of Gimlis.

aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)
t.test(legolas, aragorn, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and aragorn
## t = 6.6858, df = 87.098, p-value = 2.099e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.01283 22.17695
## sample estimates:
## mean of x mean of y 
##  195.6145  178.5196

t.test(legolas, gimli, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and gimli
## t = 20.826, df = 97.721, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  57.61429 69.75091
## sample estimates:
## mean of x mean of y 
##  195.6145  131.9319

Do you find evidence for significant differences?

Since the p-value for both t-tests is approaching zero, there is evidence with high confidence of significant differences between the height of Legolas and Aragorn or Gimli.

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors.

var.test(gimli, legolas)

## 
##  F test to compare two variances
## 
## data:  gimli and legolas
## F = 1.1128, num df = 49, denom df = 49, p-value = 0.7098
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6314953 1.9609876
## sample estimates:
## ratio of variances 
##           1.112814

Do these groups have different variance?

The p-value is high (p = 0.7505), indicating there is no significant difference in variance.

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species.

library(magrittr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

iris = read.csv("iris.csv") 
iris %>%
  group_by(Species)

## # A tibble: 150 x 6
## # Groups:   Species [3]
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Code
##           <dbl>       <dbl>        <dbl>       <dbl> <chr>   <int>
##  1          5.1         3.5          1.4         0.2 setosa      1
##  2          4.9         3            1.4         0.2 setosa      1
##  3          4.7         3.2          1.3         0.2 setosa      1
##  4          4.6         3.1          1.5         0.2 setosa      1
##  5          5           3.6          1.4         0.2 setosa      1
##  6          5.4         3.9          1.7         0.4 setosa      1
##  7          4.6         3.4          1.4         0.3 setosa      1
##  8          5           3.4          1.5         0.2 setosa      1
##  9          4.4         2.9          1.4         0.2 setosa      1
## 10          4.9         3.1          1.5         0.1 setosa      1
## # … with 140 more rows

setosa = iris %>%
  filter(Species == "setosa")
versicolor = iris %>%
  filter(Species == "versicolor")
virginica = iris %>%
  filter(Species == "virginica")
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  setosa$Sepal.Length and setosa$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  versicolor$Sepal.Length and versicolor$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

cor.test(virginica$Sepal.Length, virginica$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  virginica$Sepal.Length and virginica$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

Are these correlated?

The low p-value for all species indiates that there is a significant correlation betwen Sepal Length and Sepal Width among species.

Using the deer dataset and the chisq.test() function, test if there are significant differences in the number of deer caught per month

deer = read.csv("deer.csv")
str(deer)

## 'data.frame':    1182 obs. of  9 variables:
##  $ Farm   : chr  "AL" "AL" "AL" "AL" ...
##  $ Month  : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Year   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sex    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ clas1_4: int  4 4 3 4 4 4 4 4 4 4 ...
##  $ LCT    : num  191 180 192 196 204 190 196 200 197 208 ...
##  $ KFI    : num  20.4 16.4 15.9 17.3 NA ...
##  $ Ecervi : num  0 0 2.38 0 0 0 1.21 0 0.8 0 ...
##  $ Tb     : int  0 0 0 0 NA 0 NA 1 0 0 ...

table(deer$Month)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

Since the p-value for this test is low, it suggests there is a significant difference in the number of deer caught per month

Using the deer dataset and the chisq.test() function, test if the cases of tuberculosis are uniformly distributed across all farms

table(deer$Tb, deer$Farm)

##    
##      AL  AU  BA  BE  CB CRC  HB LCV  LN MAN  MB  MO  NC  NV  PA  PN  QM  RF  RN
##   0  10  23  67   7  88   4  22   0  28  27  16 186  24  18  11  39  67  23  21
##   1   3   0   5   0   3   0   1   1   6  24   5  31   4   1   0   0   7   1   0
##    
##      RO SAL SAU  SE  TI  TN VISO  VY
##   0  31   0   3  16   9  16   13  15
##   1   0   1   0  10   0   2    1   4

chisq.test(table(deer$Tb, deer$Farm))

## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15

Since the p-value is so low, it is unlikely the relationship between tuberculosis cases and farms occured due to random chance.

M09

Alyssa Castor

5/8/2020

Module 09 - Simple inference tests in R

Exercises