Module 9: Probability and Statistical Inference

Author

Bryce Nelson

T-Tests

Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis.

setwd("/Users/u6022167/Desktop/GEOG5680/Module9")

aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)

Legolas vs. Aragorn

H_o: The Legolas and Aragorn actors have the same mean height.
H_a: The Legolas and Aragorn actors have different mean heights.

Legolas vs. Gimli

H_o: The Legolas and Gimli actors have the same mean height.
H_a: The Legolas and Gimli actors have different mean heights.

t.test(legolas, gimli, alternative="two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 20.886, df = 95.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 56.09942 67.88299
sample estimates:
mean of x mean of y 
 192.7535  130.7623

t.test(legolas, aragorn, alternative="two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 4.2595, df = 88.021, p-value = 5.116e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.269111 17.234841
sample estimates:
mean of x mean of y 
 192.7535  181.0015

Try the “greater” alternative, since Legolas is an elf, and likely taller than Gimli and Aragorn.

t.test(legolas,aragorn, alternative="greater")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 4.2595, df = 88.021, p-value = 2.558e-05
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 7.165594      Inf
sample estimates:
mean of x mean of y 
 192.7535  181.0015

t.test(legolas,gimli, alternative="greater")


    Welch Two Sample t-test

data:  legolas and gimli
t = 20.886, df = 95.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 57.06146      Inf
sample estimates:
mean of x mean of y 
 192.7535  130.7623

Do you find evidence for significant differences?

The t-tests comparing Legolas actors to Aragorn actors and Gimli actors both produced very small p-values (p = 4.768 × 10⁻⁸ and p < 2.2 × 10⁻¹⁶, respectively). Therefore, the null hypotheses of equal mean heights were rejected in both cases. There is strong evidence that the mean height of the Legolas actors differs significantly from both the Aragorn and Gimli actors and that the Legolas actors are taller.

Variance Test F-Test var.test

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors.

var.test(gimli, legolas)


    F test to compare two variances

data:  gimli and legolas
F = 0.73157, num df = 49, denom df = 49, p-value = 0.2774
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.4151507 1.2891710
sample estimates:
ratio of variances 
         0.7315738

Do these groups have different variance?

These groups do not have different variance. The ratio of variances was 1.159, indicating that the sample variance of the Gimli actor heights was approximately 16% larger than that of the Legolas actor height. P = 0.6075, indicating no significant difference in the variance.

Correlation Tests

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species.

iris = read.csv("iris.csv")

#Subsets!!!
setosa = subset(iris, Species == "setosa")
versicolor = subset(iris, Species == "versicolor")
virginica = subset(iris, Species == "virginica")

cor.test(setosa$Sepal.Length, setosa$Sepal.Width)


    Pearson's product-moment correlation

data:  setosa$Sepal.Length and setosa$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467

cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)


    Pearson's product-moment correlation

data:  versicolor$Sepal.Length and versicolor$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107

cor.test(virginica$Sepal.Length, virginica$Sepal.Width)


    Pearson's product-moment correlation

data:  virginica$Sepal.Length and virginica$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278

Are these correlated?

Sepal length and sepal width are positively correlated within all three iris species. The p-value is low so we can reject the null hypthesis that there is no correlation. Using Pearson’s product-moment correlation value (“cor”), each species has a value in the moderate to strong range, with Setosa having the strongest correlation.

Chi-Squared Tests

Deer Caught Per Month

Using the deer dataset and the chisq.test() function, test if there are significant differences in the number of deer caught per month.

deer = read.csv("deer.csv")

str(deer)

'data.frame':   1182 obs. of  9 variables:
 $ Farm   : chr  "AL" "AL" "AL" "AL" ...
 $ Month  : int  10 10 10 10 10 10 10 10 10 10 ...
 $ Year   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Sex    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ clas1_4: int  4 4 3 4 4 4 4 4 4 4 ...
 $ LCT    : num  191 180 192 196 204 190 196 200 197 208 ...
 $ KFI    : num  20.4 16.4 15.9 17.3 NA ...
 $ Ecervi : num  0 0 2.38 0 0 0 1.21 0 0.8 0 ...
 $ Tb     : int  0 0 0 0 NA 0 NA 1 0 0 ...

table(deer$Month)


  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))


    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

Significance

The test was significant because the p-value was far below 0.05 (p < 2.2 × 10⁻¹⁶), indicating that the observed differences were much larger than would be expected by chance alone. Deer captures were not uniformly distributed throughout the year.

Tuberculosis Distribution Among Farms

Test if the cases of tuberculosis are uniformly distributed across all farms.

table(deer$Farm, deer$Tb)

      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4

chisq.test(table(deer$Farm, deer$Tb))

Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect


    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

Significance

No, tuberculosis cases are not uniformly distributed across all farms. The low p-value value indicates a significant relationship between farm and deer tuberculosis.