Module09_Exercise: Probability and Inference Tests

Author

Cienna Kim

Published

June 17, 2026

Introduction: Data Loading and Setup

# Check files in the working directory
list.files()

[1] "data"                                 
[2] "u1412840_Module09_Exercises.html"     
[3] "u1412840_Module09_Exercises.qmd"      
[4] "u1412840_Module09_Exercises.rmarkdown"

# Load data
iris_data <- iris
deer <- read.csv("./data/Deer.csv")

# Display structure
str(iris_data)

'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

head(iris_data)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

str(deer)

'data.frame':   1182 obs. of  9 variables:
 $ Farm   : chr  "AL" "AL" "AL" "AL" ...
 $ Month  : int  10 10 10 10 10 10 10 10 10 10 ...
 $ Year   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Sex    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ clas1_4: int  4 4 3 4 4 4 4 4 4 4 ...
 $ LCT    : num  191 180 192 196 204 190 196 200 197 208 ...
 $ KFI    : num  20.4 16.4 15.9 17.3 NA ...
 $ Ecervi : num  0 0 2.38 0 0 0 1.21 0 0.8 0 ...
 $ Tb     : int  0 0 0 0 NA 0 NA 1 0 0 ...

head(deer)

  Farm Month Year Sex clas1_4 LCT   KFI Ecervi Tb
1   AL    10    0   1       4 191 20.45   0.00  0
2   AL    10    0   1       4 180 16.40   0.00  0
3   AL    10    0   1       3 192 15.90   2.38  0
4   AL    10    0   1       4 196 17.30   0.00  0
5   AL    10    0   1       4 204    NA   0.00 NA
6   AL    10    0   1       4 190 16.30   0.00  0

1. t-test: Legolas vs Aragorn & Legolas vs Gimli

# Simulate actor heights using the lecture setup
set.seed(5680)

aragorn <- rnorm(50, mean = 180, sd = 10)
gimli <- rnorm(50, mean = 132, sd = 15)
legolas <- rnorm(50, mean = 195, sd = 15)

# Compare Legolas and Aragorn
t.test(legolas, aragorn, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 7.134, df = 97.754, p-value = 1.713e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 13.34126 23.62431
sample estimates:
mean of x mean of y 
 196.0965  177.6137

# Compare Legolas and Gimli
t.test(legolas, gimli, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 22.38, df = 97.867, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 55.17697 65.91455
sample estimates:
mean of x mean of y 
 196.0965  135.5507

Legolas vs Aragorn

(p < 0.05) Therefore, there is a statistically significant difference between the Legolas and Aragorn actor groups.

Legolas vs Gimli

(p < 0.05) Therefore, there is a statistically significant difference between the Legolas and Gimli actor groups.

2. F-test: Gimli vs Legolas

# Compare the variances of Gimli and Legolas
var.test(gimli, legolas)


    F test to compare two variances

data:  gimli and legolas
F = 1.0767, num df = 49, denom df = 49, p-value = 0.797
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.6109822 1.8972882
sample estimates:
ratio of variances 
          1.076666

Gimli vs Legolas

(p > 0.05) Therefore, there is no statistically significant difference in variance between the Gimli and Legolas actor groups.

3. Correlation tests: Sepal Length and Sepal Width by Species

# Split the iris data by species
setosa <- subset(iris_data, Species == "setosa")
versicolor <- subset(iris_data, Species == "versicolor")
virginica <- subset(iris_data, Species == "virginica")

# Correlation test for setosa
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)


    Pearson's product-moment correlation

data:  setosa$Sepal.Length and setosa$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467

# Correlation test for versicolor
cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)


    Pearson's product-moment correlation

data:  versicolor$Sepal.Length and versicolor$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107

# Correlation test for virginica
cor.test(virginica$Sepal.Length, virginica$Sepal.Width)


    Pearson's product-moment correlation

data:  virginica$Sepal.Length and virginica$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278

Setosa

(p < 0.05) Therefore, Sepal Length and Sepal Width are significantly correlated in Setosa.

Versicolor

(p < 0.05) Therefore, Sepal Length and Sepal Width are significantly correlated in Versicolor.

Virginica

(p < 0.05) Therefore, Sepal Length and Sepal Width are significantly correlated in Virginica.

4. Chi-squared tests: Deer Data

# Test whether deer caught per month are uniformly distributed
month_counts <- table(factor(deer$Month, levels = 1:12))
month_counts


  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(month_counts)


    Chi-squared test for given probabilities

data:  month_counts
X-squared = 997.07, df = 11, p-value < 2.2e-16

# Test whether TB cases are uniformly distributed across farms
tb_cases_by_farm <- table(deer$Farm[!is.na(deer$Tb) & deer$Tb == 1])
tb_cases_by_farm


  AL   BA   CB   HB  LCV   LN  MAN   MB   MO   NC   NV   QM   RF  SAL   SE   TN 
   3    5    3    1    1    6   24    5   31    4    1    7    1    1   10    2 
VISO   VY 
   1    4

chisq.test(tb_cases_by_farm)


    Chi-squared test for given probabilities

data:  tb_cases_by_farm
X-squared = 189.78, df = 17, p-value < 2.2e-16

Deer Caught per Month

(p < 0.05) Therefore, the number of deer caught is not uniformly distributed across months.

Tuberculosis Cases by Farm

(p < 0.05) Therefore, tuberculosis cases are not uniformly distributed across farms.