Module 9: Simple Inference Tests

Load in data

deer = read.csv("Deer.csv")
iris = read.csv("iris.csv")

Task 1, Make up a vector of 50 random Legolas actors, with mean height of 195cm, and a standard deviation of 15cm. Run a t-test to compare this sample of actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

legolas = rnorm(50, mean=195, sd=15)
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)

t.test(legolas, aragorn, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and aragorn
## t = 4.2586, df = 93.175, p-value = 4.907e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.099234 16.756692
## sample estimates:
## mean of x mean of y 
##  192.4947  181.0667

t.test(legolas, gimli, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and gimli
## t = 20.992, df = 97.985, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  56.16865 67.89712
## sample estimates:
## mean of x mean of y 
##  192.4947  130.4618

With such low p-values for both above t-tests I can confidently reject the null for the alternate hypothesis that the is a significant height difference between the actors for Legolas and Aragorn as well as between Legolas and Gimli.

Task 2, Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

var.test(gimli, legolas)

## 
##  F test to compare two variances
## 
## data:  gimli and legolas
## F = 0.9757, num df = 49, denom df = 49, p-value = 0.9317
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.553686 1.719366
## sample estimates:
## ratio of variances 
##           0.975699

With such a high p-value, I cannot reject the null hypothesis that there is not significant difference between the height of the two groups.

Task 3, Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

setosa = subset(iris, Species == "setosa")          # subset species to make correlation easier
versicolor = subset(iris, Species == "versicolor")
virginica = subset(iris, Species == "virginica")

cor.test(setosa$Sepal.Length, setosa$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  setosa$Sepal.Length and setosa$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  versicolor$Sepal.Length and versicolor$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

cor.test(virginica$Sepal.Length, virginica$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  virginica$Sepal.Length and virginica$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

For all three species there is a low p-value leading me to reject the null hypothesis that there is not significant correlation, meaning for all three species there is siginificant correlation between sepal length and sepal width.

Task 4, Using the deer dataset and the chisq.test() function, test if there are significant differences in the number of deer caught per month, and if the cases of tuberculosis are uniformly distributed across all farms

chisq.test(table(deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

chisq.test(table(deer$Farm, deer$Tb))

## Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(deer$Farm, deer$Tb)
## X-squared = 129.09, df = 26, p-value = 1.243e-15

The deer caught chi-square test yielded a very low p-value leading me to reject the null meaning there is a significant difference in deer caught per month. For the comparison between the farm and deer cases of TB, there is also a low p-value. Which tells me there is a uniform distribution of Tb cases across the farms.

Module 9: Simple Inference Tests

Tucker Langston

5/26/2021

Load in data

Task 1, Make up a vector of 50 random Legolas actors, with mean height of 195cm, and a standard deviation of 15cm. Run a t-test to compare this sample of actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

With such low p-values for both above t-tests I can confidently reject the null for the alternate hypothesis that the is a significant height difference between the actors for Legolas and Aragorn as well as between Legolas and Gimli.

Task 2, Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

With such a high p-value, I cannot reject the null hypothesis that there is not significant difference between the height of the two groups.

Task 3, Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

For all three species there is a low p-value leading me to reject the null hypothesis that there is not significant correlation, meaning for all three species there is siginificant correlation between sepal length and sepal width.

Task 4, Using the deer dataset and the chisq.test() function, test if there are significant differences in the number of deer caught per month, and if the cases of tuberculosis are uniformly distributed across all farms