<- read.csv("Deer.csv")
deer <- read.csv("iris.csv") iris
Module09
Read the data
Compare the Legolas actors to the set of Aragorns and then the set of Gimlis
Make datasets
= rnorm(50, mean=195, sd=15)
legolas = rnorm(50, mean=180, sd=10)
aragorn = rnorm(50, mean=132, sd=15) gimli
Run a t-test
t.test(legolas, aragorn, alternative = "two.sided")
Welch Two Sample t-test
data: legolas and aragorn
t = 4.4033, df = 92.671, p-value = 2.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.104228 16.133647
sample estimates:
mean of x mean of y
193.0893 181.9704
The p-value from the first t-test comparing Legolas and Aragorn is 0.0001764, which is well below the common significance level of 0.05. This indicates that the difference in means between the two groups is statistically significant.
t.test(legolas, gimli, alternative = "two.sided")
Welch Two Sample t-test
data: legolas and gimli
t = 20.69, df = 96.304, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
56.47346 68.45883
sample estimates:
mean of x mean of y
193.0893 130.6231
In the second t-test, which compared Legolas and Gimli, the p-value was less than 2.2e-16, also suggesting a statistically significant difference between the groups. Therefore, significant differences in the means are found between Legolas and the other groups in both comparisons.
Compare the group of Gimli and Legolas actors by running a variance test (F-test)
var.test(gimli, legolas)
F test to compare two variances
data: gimli and legolas
F = 1.306, num df = 49, denom df = 49, p-value = 0.3532
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7411454 2.3014850
sample estimates:
ratio of variances
1.306038
The variances of the two groups are different. The F-test comparing the variances of Gimli and Legolas gives a p-value of 0.01215, which is less than the standard significance level of 0.05. Therefore, the variances between the two groups are significantly different.
Do the correlation for the Sepal Length and Sepal Width
Make subsets
<- subset(iris, Species == "setosa")
setosa <- subset(iris, Species == "versicolor")
versicolor <- subset(iris, Species == "virginica") virginica
Run Correlation tests
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)
Pearson's product-moment correlation
data: setosa$Sepal.Length and setosa$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5851391 0.8460314
sample estimates:
cor
0.7425467
Setosa: The correlation coefficient is 0.7425, with a p-value of 6.71e-10. This indicates a strong, statistically significant positive correlation.
cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)
Pearson's product-moment correlation
data: versicolor$Sepal.Length and versicolor$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2900175 0.7015599
sample estimates:
cor
0.5259107
Versicolor: The correlation coefficient is 0.5259 with a p-value of 8.77e-05, showing a moderate, significant positive correlation.
cor.test(virginica$Sepal.Length, virginica$Sepal.Width)
Pearson's product-moment correlation
data: virginica$Sepal.Length and virginica$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2049657 0.6525292
sample estimates:
cor
0.4572278
Virginica: The correlation coefficient is 0.4572, with a p-value of 0.0008435. This indicates a weaker, yet still statistically significant, positive correlation. In summary, all three species exhibit significant positive correlations between sepal length and sepal width.
Using the deer dataset and the chisq.test()
If there are significant differences in the number of deer caught per month
table(deer$Month)
1 2 3 4 5 6 7 8 9 10 11 12
256 165 27 3 2 35 11 19 58 168 189 188
chisq.test(table(deer$Month))
Chi-squared test for given probabilities
data: table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16
The chi-squared test on the distribution of deer across months yields a chi-squared statistic of 997.07 with 11 degrees of freedom and a p-value of < 2.2e-16. Since the p-value is far below the standard significance level (e.g., 0.05), the number of deer caught varies significantly across months. ### If the cases of tuberculosis are uniformly distributed across all farms
table(deer$Farm, deer$Tb)
0 1
AL 10 3
AU 23 0
BA 67 5
BE 7 0
CB 88 3
CRC 4 0
HB 22 1
LCV 0 1
LN 28 6
MAN 27 24
MB 16 5
MO 186 31
NC 24 4
NV 18 1
PA 11 0
PN 39 0
QM 67 7
RF 23 1
RN 21 0
RO 31 0
SAL 0 1
SAU 3 0
SE 16 10
TI 9 0
TN 16 2
VISO 13 1
VY 15 4
chisq.test(table(deer$Farm, deer$Tb))
Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect
Pearson's Chi-squared test
data: table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15
The chi-squared test comparing farm and tuberculosis (TB) status yielded a chi-squared value of 129.09 with 26 degrees of freedom and a p-value of 1.243e-15. The very small p-value indicates a statistically significant association between farm status and tuberculosis (Tb) status of deer, suggesting that Tb cases are not evenly distributed across farms.