exercise09

Author

Hangu Lee

0. Setting

# Load the dataset
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, mean=195, sd=15)
iris_data = iris
deer = read.csv("./data/Deer.csv")

1. t-test (Legolas vs Others)

Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

# Legolas vs Aragorn
t.test(legolas, aragorn, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 4.8787, df = 78.976, p-value = 5.44e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  7.638232 18.166161
sample estimates:
mean of x mean of y 
 194.8437  181.9415

(p<0.05) There is a statistically highly significant difference in the mean heights between Legolas and Aragorn actors.

# Legolas vs Gimli
t.test(legolas, gimli, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 20.606, df = 94.719, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 55.20758 66.98009
sample estimates:
mean of x mean of y 
 194.8437  133.7498

(p<0.05) There is a statistically highly significant difference in the mean heights between Legolas and Gimli actors.

2. variance test (Gimli vs Legolas)

Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors. Do these groups have different variance?

# Gimli vs Legolas
var.test(gimli, legolas)


    F test to compare two variances

data:  gimli and legolas
F = 0.68619, num df = 49, denom df = 49, p-value = 0.191
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3893961 1.2091949
sample estimates:
ratio of variances 
         0.6861893

(p>0.05) There is no statistically significant difference in variance between the Gimli and Legolas actor groups.

3. correlation (Iris)

Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

# Setosa
setosa_data = subset(iris_data, Species == "setosa")
cor.test(setosa_data$Sepal.Length, setosa_data$Sepal.Width)


    Pearson's product-moment correlation

data:  setosa_data$Sepal.Length and setosa_data$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467

(p<0.05) There is a statistically significant, strong positive correlation (r≈0.74) within the Setosa species.

# Versicolor
versicolor_data = subset(iris_data, Species == "versicolor")
cor.test(versicolor_data$Sepal.Length, versicolor_data$Sepal.Width)


    Pearson's product-moment correlation

data:  versicolor_data$Sepal.Length and versicolor_data$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107

(p<0.05) There is a statistically significant, moderate positive correlation (r≈0.53) within the Versicolor species.

# Virginica
virginica_data = subset(iris_data, Species == "virginica")
cor.test(virginica_data$Sepal.Length, virginica_data$Sepal.Width)


    Pearson's product-moment correlation

data:  virginica_data$Sepal.Length and virginica_data$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278

(p<0.05) There is a statistically significant, positive correlation (r≈0.46) within the Virginica species.

4. chi-squared tests (Deer)

If there are significant differences in the number of deer caught per month

month_table = table(deer$Month)
chisq.test(month_table)


    Chi-squared test for given probabilities

data:  month_table
X-squared = 997.07, df = 11, p-value < 2.2e-16

(p<0.05) The number of deer caught is not uniformly distributed across months, indicating significant seasonal variation.

If the cases of tuberculosis are uniformly distributed across all farms

tb_deer = subset(deer, Tb == 1)
farm_tb_table = table(tb_deer$Farm)
chisq.test(farm_tb_table)


    Chi-squared test for given probabilities

data:  farm_tb_table
X-squared = 189.78, df = 17, p-value < 2.2e-16

(p<0.05) Tuberculosis cases are not uniformly distributed across farms, indicating that infections are significantly clustered in specific farms.