setwd("/Users/u6022167/Desktop/GEOG5680/Module9")
aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)Module 9: Probability and Statistical Inference
T-Tests
Run a t-test to compare the Legolas actors to the set of Aragorns and then the set of Gimlis.
Legolas vs. Aragorn
- Ho: The Legolas and Aragorn actors have the same mean height.
- Ha: The Legolas and Aragorn actors have different mean heights.
Legolas vs. Gimli
- Ho: The Legolas and Gimli actors have the same mean height.
- Ha: The Legolas and Gimli actors have different mean heights.
t.test(legolas, gimli, alternative="two.sided")
Welch Two Sample t-test
data: legolas and gimli
t = 20.886, df = 95.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
56.09942 67.88299
sample estimates:
mean of x mean of y
192.7535 130.7623
t.test(legolas, aragorn, alternative="two.sided")
Welch Two Sample t-test
data: legolas and aragorn
t = 4.2595, df = 88.021, p-value = 5.116e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.269111 17.234841
sample estimates:
mean of x mean of y
192.7535 181.0015
Try the “greater” alternative, since Legolas is an elf, and likely taller than Gimli and Aragorn.
t.test(legolas,aragorn, alternative="greater")
Welch Two Sample t-test
data: legolas and aragorn
t = 4.2595, df = 88.021, p-value = 2.558e-05
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
7.165594 Inf
sample estimates:
mean of x mean of y
192.7535 181.0015
t.test(legolas,gimli, alternative="greater")
Welch Two Sample t-test
data: legolas and gimli
t = 20.886, df = 95.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
57.06146 Inf
sample estimates:
mean of x mean of y
192.7535 130.7623
Do you find evidence for significant differences?
The t-tests comparing Legolas actors to Aragorn actors and Gimli actors both produced very small p-values (p = 4.768 × 10⁻⁸ and p < 2.2 × 10⁻¹⁶, respectively). Therefore, the null hypotheses of equal mean heights were rejected in both cases. There is strong evidence that the mean height of the Legolas actors differs significantly from both the Aragorn and Gimli actors and that the Legolas actors are taller.
Variance Test F-Test var.test
Re-run the variance test (F-test) to compare the group of Gimli and Legolas actors.
var.test(gimli, legolas)
F test to compare two variances
data: gimli and legolas
F = 0.73157, num df = 49, denom df = 49, p-value = 0.2774
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.4151507 1.2891710
sample estimates:
ratio of variances
0.7315738
Do these groups have different variance?
These groups do not have different variance. The ratio of variances was 1.159, indicating that the sample variance of the Gimli actor heights was approximately 16% larger than that of the Legolas actor height. P = 0.6075, indicating no significant difference in the variance.
Correlation Tests
Redo the correlation for the Sepal Length and Sepal Width for the Iris dataset, but for the three individual species.
iris = read.csv("iris.csv")
#Subsets!!!
setosa = subset(iris, Species == "setosa")
versicolor = subset(iris, Species == "versicolor")
virginica = subset(iris, Species == "virginica")
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)
Pearson's product-moment correlation
data: setosa$Sepal.Length and setosa$Sepal.Width
t = 7.6807, df = 48, p-value = 6.71e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5851391 0.8460314
sample estimates:
cor
0.7425467
cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)
Pearson's product-moment correlation
data: versicolor$Sepal.Length and versicolor$Sepal.Width
t = 4.2839, df = 48, p-value = 8.772e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2900175 0.7015599
sample estimates:
cor
0.5259107
cor.test(virginica$Sepal.Length, virginica$Sepal.Width)
Pearson's product-moment correlation
data: virginica$Sepal.Length and virginica$Sepal.Width
t = 3.5619, df = 48, p-value = 0.0008435
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2049657 0.6525292
sample estimates:
cor
0.4572278
Chi-Squared Tests
Deer Caught Per Month
Using the deer dataset and the chisq.test() function, test if there are significant differences in the number of deer caught per month.
deer = read.csv("deer.csv")
str(deer)'data.frame': 1182 obs. of 9 variables:
$ Farm : chr "AL" "AL" "AL" "AL" ...
$ Month : int 10 10 10 10 10 10 10 10 10 10 ...
$ Year : int 0 0 0 0 0 0 0 0 0 0 ...
$ Sex : int 1 1 1 1 1 1 1 1 1 1 ...
$ clas1_4: int 4 4 3 4 4 4 4 4 4 4 ...
$ LCT : num 191 180 192 196 204 190 196 200 197 208 ...
$ KFI : num 20.4 16.4 15.9 17.3 NA ...
$ Ecervi : num 0 0 2.38 0 0 0 1.21 0 0.8 0 ...
$ Tb : int 0 0 0 0 NA 0 NA 1 0 0 ...
table(deer$Month)
1 2 3 4 5 6 7 8 9 10 11 12
256 165 27 3 2 35 11 19 58 168 189 188
chisq.test(table(deer$Month))
Chi-squared test for given probabilities
data: table(deer$Month)
X-squared = 997.07, df = 11, p-value < 2.2e-16
Significance
The test was significant because the p-value was far below 0.05 (p < 2.2 × 10⁻¹⁶), indicating that the observed differences were much larger than would be expected by chance alone. Deer captures were not uniformly distributed throughout the year.
Tuberculosis Distribution Among Farms
Test if the cases of tuberculosis are uniformly distributed across all farms.
table(deer$Farm, deer$Tb)
0 1
AL 10 3
AU 23 0
BA 67 5
BE 7 0
CB 88 3
CRC 4 0
HB 22 1
LCV 0 1
LN 28 6
MAN 27 24
MB 16 5
MO 186 31
NC 24 4
NV 18 1
PA 11 0
PN 39 0
QM 67 7
RF 23 1
RN 21 0
RO 31 0
SAL 0 1
SAU 3 0
SE 16 10
TI 9 0
TN 16 2
VISO 13 1
VY 15 4
chisq.test(table(deer$Farm, deer$Tb))Warning in chisq.test(table(deer$Farm, deer$Tb)): Chi-squared approximation may
be incorrect
Pearson's Chi-squared test
data: table(deer$Farm, deer$Tb)
X-squared = 129.09, df = 26, p-value = 1.243e-15
Significance
No, tuberculosis cases are not uniformly distributed across all farms. The low p-value value indicates a significant relationship between farm and deer tuberculosis.