Dalgaard Chapter 5 Functions – Alex Crawford

5.1 One Sample t-Test

Data Set-up
Data for energy intake in kJ for 11 women:

daily.intake <- c(5260, 5470, 5640, 6180, 6390, 6515, 6805, 7515, 7515, 8230, 
    8770)

Summary Statistics

mean(daily.intake)
## [1] 6754
sd(daily.intake)
## [1] 1142
quantile(daily.intake)
##   0%  25%  50%  75% 100% 
## 5260 5910 6515 7515 8770
qqnorm(daily.intake, main = "Q-Q Plot of Daily Intake")
qqline(daily.intake)

plot of chunk unnamed-chunk-2

shapiro.test(daily.intake)
## 
##  Shapiro-Wilk normality test
## 
## data:  daily.intake 
## W = 0.9524, p-value = 0.6743
# Data are normally distributed.

Question: Does the energy intake for this group of women deviate systematically from the recommended value of 7725?

# We can use a one-sample t-test to find out.  mu= defines assumed mean
# under null hypothesis.  alternative= 'greater' or 'less' sets a
# one-sided t-test.  conf.level= sets the confidence interval (default
# 0.95)
t.test(daily.intake, mu = 7725)
## 
##  One Sample t-test
## 
## data:  daily.intake 
## t = -2.821, df = 10, p-value = 0.01814
## alternative hypothesis: true mean is not equal to 7725 
## 95 percent confidence interval:
##  5986 7521 
## sample estimates:
## mean of x 
##      6754
# Since the p-value is 0.01814, the probability that the difference is due
# to random chance is too great to conclude there is a systematic
# difference.

5.2 Wilcoxon signed-rank test

Although t-tests are “robust” against departures from normal distribution with larger samples, it may be advisable to use a sign-ranked test instead.
Wilcoxon signed-rank test assumes only that mean = median, not that it is normally distributed.
Wilcoxon is also “non-parametric” because it does not estimate a parameter.
Cannot be significant at 5% level if n < 6

wilcox.test(daily.intake, mu = 7725)
## Warning: cannot compute exact p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  daily.intake 
## V = 8, p-value = 0.0293
## alternative hypothesis: true location is not equal to 7725
# The 'ties' warning is that instances with the same value cannot be
# ranked against each other, and so are assigned an average rank.

5.3 Two-sample t-Test

For determining if two samples are distinct. Still need to be normally distributed.
Two ways to calculate the standard error of the differences of the means; one assumes the two samples have the same variance, while the other (Welch method) does not. Welch is default.

diatom <- read.csv("/Users/telekineticturtle/Desktop/Colorado 13/Quant Methods/Data/diatoms_NBP0003_LarsenA.csv", 
    header = T)
diatomB <- subset(diatom, Unit == "B")
diatomA <- subset(diatom, Unit == "A")
diatomC <- subset(diatom, Unit == "C")
diatomT <- subset(diatom, Unit == "T")

# Data are roughly normally distributed.  shapiro.test(diatomB$perFragil)
# hist(diatomB$perFragil,main='Unit B Percent
# Fragilariopsis',xlab='Percent Fragilariopsis')
# shapiro.test(diatomC$perFragil) hist(diatomC$perFragil,main='Unit C
# Percent Fragilariopsis',xlab='Percent Fragilariopsis')

# Two-sample t-test: Null: Fragilariopsis make up the same percentage of
# the diatom assemblages in Unit B and Unit C.  Any difference in mean %
# Fragilariopsis is due to random chance.
t.test(diatomB$perFragil, diatomC$perFragil)
## 
##  Welch Two Sample t-test
## 
## data:  diatomB$perFragil and diatomC$perFragil 
## t = 2.527, df = 39.74, p-value = 0.01559
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  0.9852 8.8652 
## sample estimates:
## mean of x mean of y 
##     24.35     19.42

This test shows that we can be confident that the % Fragiliaropsis is different between the two units; if they were not different, we would receive these results less than 5% of the time. This validates the claim that the two units were formed by different depositional environments.

5.4 Comparison of Variances

Use var.test() to see if the variances are actually the same. We perfer that the null hypothesis cannot be rejected (that the test is not significant).
Warning: Only apply to independent groups, not to paired data.

# Null: The variance of % Fragilariopsis in Units B and C are the same.
var.test(diatomB$perFragil, diatomC$perFragil)
## 
##  F test to compare two variances
## 
## data:  diatomB$perFragil and diatomC$perFragil 
## F = 1.638, num df = 34, denom df = 16, p-value = 0.2938
## alternative hypothesis: true ratio of variances is not equal to 1 
## 95 percent confidence interval:
##  0.6448 3.6362 
## sample estimates:
## ratio of variances 
##              1.638

Good news! We cannot reject the null hypothesis! True variance could be equal.

5.5 Two-Sample Wilcoxon Test

Same basic idea as the one-sample Wicoxon.

# Null: Fragilariopsis make up the same percentage of the diatom
# assemblages in Unit B and Unit C.  Any difference in mean %
# Fragilariopsis is due to random chance.
wilcox.test(diatomB$perFragil, diatomC$perFragil)
## Warning: cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  diatomB$perFragil and diatomC$perFragil 
## W = 413, p-value = 0.02487
## alternative hypothesis: true location shift is not equal to 0

The Wilcoxon test concludes the same as the t-test. The two units are different in terms of Fragilariopsis fraction.

5.6 The Paired t-Test

This is like a one sample t-test, only each instance is compared to a different (previous) value. It's useful in a time-series (e.g. surface albedo in July 1989 v. July 2019).
t.test(PRE, POST, paired=T) is the format.
paired=TRUE is what distinguishes a paired t-test from a two sample t-test.

# Let's use the diatom data to simulate this (even though it's not a time
# series).  Null: The pre mean is the same as the post mean. Any observed
# difference is from random chance.
t.test(diatomB$perChaet, diatomB$perFragil, paired = TRUE)
## 
##  Paired t-test
## 
## data:  diatomB$perChaet and diatomB$perFragil 
## t = 11.59, df = 34, p-value = 2.347e-13
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  30.24 43.11 
## sample estimates:
## mean of the differences 
##                   36.67

Obviously, this was going to turn up significant, but you get the idea.

5.7 The Matched-Pairs Wilcoxon Test

This is the non-parametric version of the paired t-test. Use the same arguments as above:
wilcox.test(pre, post, paired=TRUE)

# Null: The pre mean is the same as the post mean. Any observed difference
# is from random chance.
wilcox.test(diatomB$perChaet, diatomB$perFragil, paired = TRUE)
## 
##  Wilcoxon signed rank test
## 
## data:  diatomB$perChaet and diatomB$perFragil 
## V = 630, p-value = 5.821e-11
## alternative hypothesis: true location shift is not equal to 0