One method for assessing the bioavailability of a drug is to note its concentration in blood and/or urine samples at certain periods of time after the drug is given. Suppose we want to compare the concentrations of two types of aspirin (types A and B) in urine specimens taken from the same person 1 hour after he or she has taken the drug. Hence, a specific dosage of either type A or type B aspirin is given at one time and the 1-hour urine concentration is measured.
One week later, after the first aspirin has presumably been cleared from the system, the same dosage of the other aspirin is given to the same person and the 1-hour urine concentration is noted. Because the order of giving the drugs may affect the results, a table of random numbers is used to decide which of the two types of aspirin to give first. This experiment is performed on 10 people; the results are given in the following table.
# Read in concentration data, 1 header row (default), code results displayed in html output (default)
dat<-read.csv("https://raw.githubusercontent.com/stephenratcliff/IE5342/main/AspirinConcentration.csv")
dat
## Person Aspirin.A Aspirin.B
## 1 1 15 13
## 2 2 26 20
## 3 3 13 10
## 4 4 28 21
## 5 5 17 17
## 6 6 20 22
## 7 7 7 5
## 8 8 36 30
## 9 9 12 7
## 10 10 18 11
a. State a hypothesis
Suppose we want to test the hypothesis that the mean concentrations of the two drugs are the same in urine specimens. State the appropriate hypothesis.
If the mean concentrations of the two aspirin types in a single subject’s urine approximated by the samples are equal, the hypothesis is:
H0: µA = µB
-OR-
µA - µB = µd = 0
b. Paired t-test
Test the hypothesis using a paired t-test, report the p-value, and state your conclusion (alpha = 0.05)
# paired t-test with default 95% confidence level
t.test(dat[,2], dat[,3], paired=TRUE)
##
## Paired t-test
##
## data: dat[, 2] and dat[, 3]
## t = 3.6742, df = 9, p-value = 0.005121
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.383548 5.816452
## sample estimates:
## mean of the differences
## 3.6
Our p-value = 0.005121 < 0.05 = α, so we reject the null hypothesis and we are 95% confident that the means of the different aspirin concentrations are not equal.
c.Two-sample t-test
Suppose that you tested this hypothesis using a two-sample t-test (instead of a paired t-test). What would the p-value of your test have been?
# 2 sample t-test with default 95% confidence level
t.test(dat[,2], dat[,3], paired=FALSE)
##
## Welch Two Sample t-test
##
## data: dat[, 2] and dat[, 3]
## t = 0.9802, df = 17.811, p-value = 0.3401
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.12199 11.32199
## sample estimates:
## mean of x mean of y
## 19.2 15.6
Without pairing, our p-value = 0.3401 > 0.05 = α and we fail to reject the null hypothesis. 0 is within the confidence interval, so both the sample means are are within the confidence interval and we cannot prove they are not equal.
Can active exercise shorten the time that it takes an infant to learn how to walk alone? Researchers randomly allocated 12 one-week old male infants from white, middle class families to one of two treatment groups. The is the active exercise group received stimulation of the walking reflexes for four 3-minute sessions each day from the beginning of the second week through the end of the eighth week. Those in the other group received no such stimulation.
# Read in time to walk data
dat2<-read.csv("https://raw.githubusercontent.com/stephenratcliff/IE5342/main/TimeToWalkExercise.csv")
dat2
## Active.Exercise No.Exercise
## 1 9.50 11.50
## 2 10.00 12.00
## 3 9.75 13.25
## 4 9.75 11.50
## 5 9.00 13.00
## 6 13.00 9.00
Is there sufficient evidence to conclude that the groups differ in the typical time required to first walking?
a. State the null and alternative hypothesis
The hypotheses to test are the NULL hypothesis, in which the mean time to learn to walk for each population approximated by the samples are the same:
H0: µActive = µNo
-OR-
µActive - µNo = µd = 0
And the alternative hypothesis, in which the mean time to learn to walk for each population approximated by the samples are different:
H1: µActive ≠ µNo
-OR-
µActive - µNo = µd ≠ 0
b. Why might you want to use a non-parametric method for analyzing this data?
We might want to use a non-parametric method to analyze this data because the distributions are skewed in opposite directions, as shown in the normal probability plots below, which can make parametric t-test very misleading without a large sample size. The Active exercise plot is skewed right, whereas the No exercise plot is skewed left.
# Normal probability plots for the time to learn to walk for exercised vs. not exercised infants
library(car)
## Loading required package: carData
qqPlot(dat2[,1],main="Time to Learn to Walk - Active Exercise", xlab = "Standard Normal Quantiles", ylab = "time in months", col="steelblue", col.lines = "steelblue")
## [1] 6 5
qqPlot(dat2[,2],main="Time to Learn to Walk - No Exerise", xlab = "Standard Normal Quantiles", ylab = "time in months", col="firebrick2", col.lines = "firebrick2")
## [1] 6 3
library(e1071)
skewness(dat2$Active.Exercise, na.rm = FALSE, type = 3)
## [1] 1.183285
skewness(dat2$No.Exercise, na.rm = FALSE, type = 3)
## [1] -0.6663129
c. Analyze using the Mann-Whitney-U test using R with alpha=0.05
Our p-value = 0.1705 > 0.05 = α, so we fail to reject the null hypothesis. Given the data collected, we cannot conclude there is any significant difference between the mean time it takes infants to learn to walk whether they participated in exercise or did not.
# Run Wilcox text with 2 parameters a.ka. a Mann-Whitney-U test, with default 95% confidence level
?wilcox.test
## starting httpd help server ... done
wilcox.test(dat2$Active.Exercise, dat2$No.Exercise)
## Warning in wilcox.test.default(dat2$Active.Exercise, dat2$No.Exercise): cannot
## compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: dat2$Active.Exercise and dat2$No.Exercise
## W = 9, p-value = 0.1705
## alternative hypothesis: true location shift is not equal to 0
Here we display the complete R code used in this analysis.
# Read in concentration data, 1 header row (default), code results displayed in html output (default)
dat<-read.csv("https://raw.githubusercontent.com/stephenratcliff/IE5342/main/AspirinConcentration.csv")
dat
# 2 sample t-test with default 95% confidence level
t.test(dat[,2], dat[,3], paired=FALSE)
# Read in time to walk data
dat2<-read.csv("https://raw.githubusercontent.com/stephenratcliff/IE5342/main/TimeToWalkExercise.csv")
dat2
# Normal probability plots for the time to learn to walk for exercised vs. not exercised infants
library(car)
qqPlot(dat2[,1],main="Time to Learn to Walk - Active Exercise", xlab = "Standard Normal Quantiles", ylab = "time in months", col="steelblue", col.lines = "steelblue")
qqPlot(dat2[,2],main="Time to Learn to Walk - No Exerise", xlab = "Standard Normal Quantiles", ylab = "time in months", col="firebrick2", col.lines = "firebrick2")
library(e1071)
skewness(dat2$Active.Exercise, na.rm = FALSE, type = 3)
skewness(dat2$No.Exercise, na.rm = FALSE, type = 3)
# Run Wilcox text with 2 parameters a.ka. a Mann-Whitney-U test, with default 95% confidence level
?wilcox.test
wilcox.test(dat2$Active.Exercise, dat2$No.Exercise)