Module 9 Exercise

Module 9: Simple Inference Tests

-For this lab the goal was to utilize the data and run simple inference tests. - Before beginning the lab, we read in the data and also loaded up the dplyr() package.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Deer <- read.csv("Deer.csv")
iris <- read.csv("iris.csv")

-For the first problem, we had to make a vector of 50 random legolas actors. These actors had a mean height of 195cm, and a standard deviation of 15cm. From there we ran a t-test to compare the sample of actors to our set pf Aragorn and Gimli actors.

aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)
t.test(aragorn, legolas, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  aragorn and legolas
## t = -5.4084, df = 84.979, p-value = 5.752e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.269358  -8.447525
## sample estimates:
## mean of x mean of y 
##  181.3406  194.6991

t.test(gimli, legolas, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  gimli and legolas
## t = -22.596, df = 97.43, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -69.02309 -57.87722
## sample estimates:
## mean of x mean of y 
##  131.2489  194.6991

As noted by the output, ther seems to be a significant difference between both Legolas and Aragorn actors, and between both Legolas and Gimli actors.
The second problem had us re-run the variance test to compare the group of Gimli and Legolas actors.

var.test(gimli, legolas)

## 
##  F test to compare two variances
## 
## data:  gimli and legolas
## F = 0.85784, num df = 49, denom df = 49, p-value = 0.5936
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4868036 1.5116753
## sample estimates:
## ratio of variances 
##          0.8578397

As noted by the output, there does not seems to be statistically significant difference in the variance between the two groups.
For the third problem, we redid the correlation for Sepal Length and Sepal Width within the iris dataset, but for each of the three individual species.

iris_seto <- iris %>%
  filter(Species == "setosa")
cor.test(iris_seto$Sepal.Length, iris_seto$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  iris_seto$Sepal.Length and iris_seto$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

iris_versi <- iris %>%
  filter(Species == "versicolor")
cor.test(iris_versi$Sepal.Length, iris_versi$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  iris_versi$Sepal.Length and iris_versi$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

iris_virgin <- iris %>%
  filter(Species == "virginica")
cor.test(iris_virgin$Sepal.Length, iris_virgin$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  iris_virgin$Sepal.Length and iris_virgin$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

As noted by the three outputs, all of the species had statistically significant correlations between their Speal Length and Sepal Width.

-Finally, we ran a chisq.test() function to test if there was a significant difference in the number of deer caught per month, and if the cases of tuberculosis are uniformly distributed across all farms.

table(Deer$Month)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(Deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(Deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

table(Deer$Farm, Deer$Tb)

##       
##          0   1
##   AL    10   3
##   AU    23   0
##   BA    67   5
##   BE     7   0
##   CB    88   3
##   CRC    4   0
##   HB    22   1
##   LCV    0   1
##   LN    28   6
##   MAN   27  24
##   MB    16   5
##   MO   186  31
##   NC    24   4
##   NV    18   1
##   PA    11   0
##   PN    39   0
##   QM    67   7
##   RF    23   1
##   RN    21   0
##   RO    31   0
##   SAL    0   1
##   SAU    3   0
##   SE    16  10
##   TI     9   0
##   TN    16   2
##   VISO  13   1
##   VY    15   4

chisq.test(table(Deer$Farm, Deer$Tb))

## Warning in chisq.test(table(Deer$Farm, Deer$Tb)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(Deer$Farm, Deer$Tb)
## X-squared = 129.09, df = 26, p-value = 1.243e-15

As noted once again, by the outputs it seems that there are significant differnces in the number of deer caught per month. According to the output for distribution of tuberculosis, it seems that there is not a uniform distribution.

Module 9 Exercise

Lucas Brizolara

2022-06-13

Module 9: Simple Inference Tests