Module09

Author

Hyunjeong Sin

Read the data

deer <- read.csv("Deer.csv")
iris <- read.csv("iris.csv")

Compare the Legolas actors to the set of Aragorns and then the set of Gimlis

Make datasets

legolas = rnorm(50, mean=195, sd=15)
aragorn = rnorm(50, mean=180, sd=10)
gimli = rnorm(50, mean=132, sd=15)

Run a t-test

t.test(legolas, aragorn, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and aragorn
t = 4.4033, df = 92.671, p-value = 2.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.104228 16.133647
sample estimates:
mean of x mean of y 
 193.0893  181.9704

The p-value from the first t-test comparing Legolas and Aragorn is 0.0001764, which is well below the common significance level of 0.05. This indicates that the difference in means between the two groups is statistically significant.

t.test(legolas, gimli, alternative = "two.sided")


    Welch Two Sample t-test

data:  legolas and gimli
t = 20.69, df = 96.304, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 56.47346 68.45883
sample estimates:
mean of x mean of y 
 193.0893  130.6231

In the second t-test, which compared Legolas and Gimli, the p-value was less than 2.2e-16, also suggesting a statistically significant difference between the groups. Therefore, significant differences in the means are found between Legolas and the other groups in both comparisons.

Compare the group of Gimli and Legolas actors by running a variance test (F-test)

var.test(gimli, legolas)


    F test to compare two variances

data:  gimli and legolas
F = 1.306, num df = 49, denom df = 49, p-value = 0.3532
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.7411454 2.3014850
sample estimates:
ratio of variances 
          1.306038

The variances of the two groups are different. The F-test comparing the variances of Gimli and Legolas gives a p-value of 0.01215, which is less than the standard significance level of 0.05. Therefore, the variances between the two groups are significantly different.

Do the correlation for the Sepal Length and Sepal Width

Make subsets

setosa <- subset(iris, Species == "setosa")
versicolor <- subset(iris, Species == "versicolor")
virginica <- subset(iris, Species == "virginica")

Run Correlation tests

cor.test(setosa$Sepal.Length, setosa$Sepal.Width)


    Pearson's product-moment correlation

data:  setosa$Sepal.Length and setosa$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5851391 0.8460314
sample estimates:
      cor 
0.7425467

Setosa: The correlation coefficient is 0.7425, with a p-value of 6.71e-10. This indicates a strong, statistically significant positive correlation.

cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)


    Pearson's product-moment correlation

data:  versicolor$Sepal.Length and versicolor$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2900175 0.7015599
sample estimates:
      cor 
0.5259107

Versicolor: The correlation coefficient is 0.5259 with a p-value of 8.77e-05, showing a moderate, significant positive correlation.

cor.test(virginica$Sepal.Length, virginica$Sepal.Width)


    Pearson's product-moment correlation

data:  virginica$Sepal.Length and virginica$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2049657 0.6525292
sample estimates:
      cor 
0.4572278

Virginica: The correlation coefficient is 0.4572, with a p-value of 0.0008435. This indicates a weaker, yet still statistically significant, positive correlation. In summary, all three species exhibit significant positive correlations between sepal length and sepal width.

Using the deer dataset and the chisq.test()

If there are significant differences in the number of deer caught per month

table(deer$Month)


  1   2   3   4   5   6   7   8   9  10  11  12 
256 165  27   3   2  35  11  19  58 168 189 188

chisq.test(table(deer$Month))


    Chi-squared test for given probabilities

data:  table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16

The chi-squared test on the distribution of deer across months yields a chi-squared statistic of 997.07 with 11 degrees of freedom and a p-value of < 2.2e-16. Since the p-value is far below the standard significance level (e.g., 0.05), the number of deer caught varies significantly across months. ### If the cases of tuberculosis are uniformly distributed across all farms

table(deer$Farm, deer$Tb)

      
         0   1
  AL    10   3
  AU    23   0
  BA    67   5
  BE     7   0
  CB    88   3
  CRC    4   0
  HB    22   1
  LCV    0   1
  LN    28   6
  MAN   27  24
  MB    16   5
  MO   186  31
  NC    24   4
  NV    18   1
  PA    11   0
  PN    39   0
  QM    67   7
  RF    23   1
  RN    21   0
  RO    31   0
  SAL    0   1
  SAU    3   0
  SE    16  10
  TI     9   0
  TN    16   2
  VISO  13   1
  VY    15   4

chisq.test(table(deer$Farm, deer$Tb))

Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect


    Pearson's Chi-squared test

data:  table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15

The chi-squared test comparing farm and tuberculosis (TB) status yielded a chi-squared value of 129.09 with 26 degrees of freedom and a p-value of 1.243e-15. The very small p-value indicates a statistically significant association between farm status and tuberculosis (Tb) status of deer, suggesting that Tb cases are not evenly distributed across farms.