Module 9

Introduction

This report will work on basic statistical inference tests using R, RStudio and knitr. The goal of this report is to:

Practice practice statistical inference tests
Continue practicing previous concepts

Read in the data

iris<-read.csv("iris.csv")
deer<-read.csv("Deer.csv")

Define variables

aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, mean=195, sd=15)

Compare actor data sets of Legolas, Aragorn, and Gimli

t.test(legolas, aragorn, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and aragorn
## t = 5.4299, df = 84.227, p-value = 5.348e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   8.084051 17.426540
## sample estimates:
## mean of x mean of y 
##  191.8878  179.1325

t.test(legolas, gimli, alternative="two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  legolas and gimli
## t = 19.88, df = 94.476, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  55.45838 67.76428
## sample estimates:
## mean of x mean of y 
##  191.8878  130.2765

We find significant evidence to reject the null hypothesis that there is a significant difference in the heights between the actors who played Legolas and the actors who played Aragorn. Furthermore, we find significant evidence to reject the null hypothesis that there is a significant difference in the heights between the actors who played Legolas and the actors who played Gimli.

Conduct an F-test on Gimli and Legolas

var.test(gimli, legolas)

## 
##  F test to compare two variances
## 
## data:  gimli and legolas
## F = 1.4787, num df = 49, denom df = 49, p-value = 0.1745
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8391182 2.6057208
## sample estimates:
## ratio of variances 
##           1.478684

We do not find signifcant evidence and fail to reject the null hypothesis that there is no differece in the variance between the group of Gimli and Legolas actors.

Run correlation tests for each species within the Iris Dataset to compare Sepal Length and Sepal Width

setosa = subset(iris, iris$Code =="1")
versicolor = subset(iris, iris$Code =="2")
virginica = subset(iris, iris$Code =="3")
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  setosa$Sepal.Length and setosa$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5851391 0.8460314
## sample estimates:
##       cor 
## 0.7425467

cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  versicolor$Sepal.Length and versicolor$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2900175 0.7015599
## sample estimates:
##       cor 
## 0.5259107

cor.test(virginica$Sepal.Length, virginica$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  virginica$Sepal.Length and virginica$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049657 0.6525292
## sample estimates:
##       cor 
## 0.4572278

We find significant evidence to reject the null hypothesis that there is a significant correlation between the Sepal Length and the Sepal Width for the Setosa, Versicolor, and Virginica species of iris.

Run a Chi Squared test on the Deer Dataset

chisq.test(table(deer$Month))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16

chisq.test(table(deer$Tb,deer$Farm))

## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15

We find statistically significant evidence to reject the null hypothesis, the number of deer caught per month is not uniform. Similarly, we find significant evidence to reject the null hypothesis, the cases of tuberculosis are not uniformly distributed across all farms.