Strange_m9_quarto

Author

Vivian Strange

Module 9 Exercise - 06/15/2025

Set working directory and load data:

deer = read.csv("Deer.csv")
iris <- read.csv("iris.csv")
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)
legolas = rnorm(50, 195, 15)

1. Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis. Do you find evidence for significant differences?

Legolas vs Aragorn:

t.test(legolas, aragorn, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and aragorn
t = 7.2457, df = 88.581, p-value = 1.519e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 13.47492 23.65874
sample estimates:
mean of x mean of y 
 195.4503  176.8835 

When comparing the Legolas and Aragorn datasets, we recieved a p-value of 2.272e-06, which is smaller than the significance level of 0.05. Therefore, we reject the null hypothesis. There is a statistically significant difference between the mean heights of Aragorn actors and Legolas actors.

Legolas vs Gimli:

t.test(legolas, gimli, alternative = "two.sided")

    Welch Two Sample t-test

data:  legolas and gimli
t = 20.771, df = 97.921, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 56.24039 68.12213
sample estimates:
mean of x mean of y 
 195.4503  133.2690 

When comparing the Legolas and Gimli datasets, we recieved a p-value of < 2.2e-16, which is much smaller than the significance level of 0.05. Therefore, we reject the null hypothesis. There is a statistically significant difference between the mean heights of Legolas actors and Gimli actors.

2. Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors.

var.test(gimli, legolas)

    F test to compare two variances

data:  gimli and legolas
F = 1.0585, num df = 49, denom df = 49, p-value = 0.8432
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.6006459 1.8651906
sample estimates:
ratio of variances 
          1.058451 

Do these groups have different variance?

No, the Gimli group and the Legolas group do not have statistically significant differences in variance. We fail to reject the null hypothesis. The p-value is 0.3895, which is not less than the significance level of 0.05. Furthermore, the null hypothesis’s variance ratio of 1 falls between the confidence interval of 0.7267759 - 2.2568634.

3. Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species. Are these correlated?

Setosa:

cor.test((iris$Sepal.Length[iris$Species=="setosa"]), (iris$Sepal.Width[iris$Species=="setosa"]))

    Pearson's product-moment correlation

data:  (iris$Sepal.Length[iris$Species == "setosa"]) and (iris$Sepal.Width[iris$Species == "setosa"])
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467 

The correlation coefficient of 0.7425467 suggests a strong positive correlation in the Setosa species between Sepal Length and Sepal Width. The correlation coefficient falls withing the 95% confidence interval of 0.5851391 - 0.8460314. The p-value is 6.71e-10, which is less than the significance level of 0.05, confirming that the correlation is statistically significant.

Versicolor:

cor.test((iris$Sepal.Length[iris$Species=="versicolor"]), (iris$Sepal.Width[iris$Species=="versicolor"]))

    Pearson's product-moment correlation

data:  (iris$Sepal.Length[iris$Species == "versicolor"]) and (iris$Sepal.Width[iris$Species == "versicolor"])
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107 

The correlation coefficient of 0.5259107 suggests a moderate positive correlation in the Versicolor species between Sepal Length and Sepal Width. The correlation coefficient falls withing the 95% confidence interval of 0.2900175 - 0.7015599. The p-value is 8.772e-05, which is less than the significance level of 0.05, confirming that the correlation is statistically significant.

Virginica:

cor.test((iris$Sepal.Length[iris$Species=="virginica"]), (iris$Sepal.Width[iris$Species=="virginica"]))

    Pearson's product-moment correlation

data:  (iris$Sepal.Length[iris$Species == "virginica"]) and (iris$Sepal.Width[iris$Species == "virginica"])
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278 

The correlation coefficient of 0.4572278 suggests a moderate positive correlation in the Versicolor species between Sepal Length and Sepal Width. The correlation coefficient falls withing the 95% confidence interval of 0.2049657 - 0.6525292. The p-value is 0.0008435, which is less than the significance level of 0.05, confirming that the correlation is statistically significant.

4. Using the deer dataset and the chisq.test() function:

deer = read.csv("Deer.csv")

Test if there are significant differences in the number of deer caught per month.

table(deer$Month)

  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188 
chisq.test(table(deer$Month))

    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

Yes, there are significant differences in the number of deer caught per month. The p-value of the chi-squared test is < 2.2e-16, which is much less than the significance level of 0.05, confirming the statistical significance of the differences in deer caught per month.

Test if the cases of tuberculosis are uniformly distributed across all farms.

table(deer$Farm, deer$Tb)
      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4
chisq.test(table(deer$Farm, deer$Tb))
Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect

    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

The cases of tuberculosis are not uniformly distributed across all farms. The p-value of the chi-squared test is 1.243e-15, which is much less than the significance level of 0.05, confirming the statistical significance of the non-uniform distribution of tuberculosis cases across all farms.