The developers of a training program designed to improve manual dexterity claim that people who complete the 6-week program will increase their manual dexterity. A random sample of 12 people enrolled in the training program was selected. A measure of each person’s dexterity on a scale from 1 (lowest) to 9 (highest) was recorded just before the start of and just after the ompletion of the 6-week program. The data are shown in the table below.
## before after
## A 6.7 7.8
## B 5.4 5.9
## C 7.0 7.6
## D 6.6 6.6
## E 6.9 7.6
## F 7.0 7.7
## G 5.5 6.0
## H 7.1 7.0
## I 7.9 7.8
## J 5.9 6.4
## K 8.4 8.7
## L 6.5 6.5
We have with before and after dexterity scores and we’re interesting in knowing if we should be convinced that this dexterity program really works… not just for this sample of people but more generally. More specifically, if \(\mu\) is the mean improvement, \(H_0: \mu = 0\) and \(H_a: \mu >0\).
First let’s calculate the individual improvements, mean improvement and standard deviation in improvements in our sample:
before <- c(6.7, 5.4, 7.0, 6.6, 6.9, 7., 5.5, 7.1, 7.9, 5.9, 8.4, 6.5)
after <- c(7.8, 5.9, 7.6, 6.6, 7.6, 7.7, 6.0, 7.0, 7.8, 6.4, 8.7, 6.5)
diff <- after - before
diff
## [1] 1.1 0.5 0.6 0.0 0.7 0.7 0.5 -0.1 -0.1 0.5 0.3 0.0
mean(diff); sd(diff)
## [1] 0.3916667
## [1] 0.3776924
With significance testing, the question we’re trying to answer is:
“How probable are results this far or further from the null hypothesis, if the null hypothesis is true?”
To answer this we could try to simulate results using the null hypothesis (\(\mu = 0\)) and use the sample standard deviation, 0.3776924, and see how often the mean improvement is greater than 0.3916667.
set.seed(4)
sims <- replicate(10000, rnorm(12, 0, sd=sd(diff)))
sum(apply(sims, 2, mean) > mean(diff))
## [1] 1
We can see that the mean improvement we saw in our sample of 12 people would be very unlikely if the true mean improvement was 0 and the true standard deviation in improvement was 0.3776924, in fact it happenned only once in 10,000 simulations. However, we don’t know whether this is the actual standard deviation in improvement. If the standard deviation in individual improvements is greater, we’ll see a wider range of results for our simulated groups of 12. Let’s see what happens if we have a standard deviation in individual improvements of 0.7.
set.seed(4)
sims <- replicate(10000, rnorm(12, 0, sd=0.7))
sum(apply(sims, 2, mean) >= mean(diff))
## [1] 238
Now, a mean as far above zero as the one we observed occurs 238 times in our 10,000 simulations. But….how unlikely is it that the true standard deviation is 0.7 given that we observed a standard deviation of 0.3776924 in our population? Once again, this is a question that we could attempt to answer with simulations. We’ll simulated 10,000 groups of 12 people using a standard deviation in dexterity improvements of 0.7 and see how often we observe a standard deviation as small as (or smaller than) the one we actually observed.
set.seed(4)
sims <- replicate(10000, rnorm(12, 0, sd=0.7))
sum(apply(sims, 2, sd) <= sd(diff))
## [1] 102
We see that we observe a standard deviation of 0.3776924 or smaller in 102 of our 10,000 simulations when the true standard deviation is 0.7.
Hopefully, you’ve begun to see that the question we are trying to answer if more challenging that it at first appears. The trick is that we are trying to grapple with two types of uncertainty simultaneouly, uncertainty in the mean improvement and uncertainty in the standard deviation in improvement. We can’t make proper inference about the mean improvement without thinking about the standard deviation in improvement. In 1908, W.S. Gossett introduced the t-distribution (and the t-test) to handle problems of this type. Here’s how it goes:
In our case we want the mean dexterity improvement and the standard deviation in the dexterity improvements and we can calculate them just as we did above. At this point we should also stop and take a look at the differences themselves. The t-test depends on these values following a normal distribution. In practice, we often cannot know whether a distribution is normally distributed based on a sample of 12 values, however if our data is obviously not normally distributed then the t-test is not the proper test and we should stop here and consider other options.
before <- c(6.7, 5.4, 7.0, 6.6, 6.9, 7., 5.5, 7.1, 7.9, 5.9, 8.4, 6.5)
after <- c(7.8, 5.9, 7.6, 6.6, 7.6, 7.7, 6.0, 7.0, 7.8, 6.4, 8.7, 6.5)
diff <- after - before
mean(diff); sd(diff)
## [1] 0.3916667
## [1] 0.3776924
stem(diff)
##
## The decimal point is at the |
##
## -0 | 11
## 0 | 003
## 0 | 555677
## 1 | 1
Our dexterity values are not obviously non-normal and normally distributed dexterity improvement seems plausible enough so we can continue to step 2.
Here we need to think back to what we learned about random variables. Recall, that if our random variable, X, had a standard deviation \(\sigma\) that the standard deviation in the average of n draws from this random variable was \(\frac{\sigma}{\sqrt{n}}\). Put another way, the more trials you perform, the less uncertainty there is in the mean result. More specially the uncertainy in the average goes does with the square root of the number of trials. We can use this fact to calculate the uncertainty (or “standard error”) in the mean improvement.
n <- 12
SE.diff <- sd(diff)/sqrt(n)
SE.diff
## [1] 0.1090304
A t-statistics is just like a z-score. It represents, how many standard errors above or below the null hypotheis our result was. To calculate it, we just need to take the difference between our mean value and the mean value expected by our null hypotheis and divide this difference by the uncertainty in the mean value.
t.statistic <- (mean(diff)- 0)/SE.diff
t.statistic
## [1] 3.592271
If there was only uncertainty in the mean value and the distribution of sample means was normally distributed we could now turn to the standard normal table and calculate how often we would see a t.statistics this large or larger by chance:
1-pnorm(t.statistic) #what you shouldn't do!
## [1] 0.0001639046
Recall that we have a second source of uncertainty, however, the uncertainty in the standard deviation! This uncertainty increases the chance that we would observe a sample mean this far or further above 0 and to adjust for this we use a distribution with fatter tails than the standard normal distribution: the t-distribution.
The t-distribution is really a set of distributions. The fewer observations we have the fatter the tails. There is a different t-distribution for every number of degrees of freedeom (df) where \(df = observations - 1\). The following graph shows a standard normal distribution (black) along with t-disbtriubitions with 3 (red), 6 (blue) and 11 (green) degrees of freedom.
x <- seq(-4,4, .001)
plot(x, dnorm(x), type="l")
points.default(x, dt(x, df=3), type="l", col="red")
points(x, dt(x, df=6), type="l", col="blue")
points(x, dt(x, df=11), type="l", col="green")
Notice how as the number of degrees of freedom (and the number of observations increases) the t distribution gets closes are closer to the standard normal distribution.
To find the probability of getting t-statistic as far or further above zero as the one we calculated based on 12 observations, we can use a t distribution with 11 degrees of freedom. Note that since pt returns the area from left tail of the t distribution up to our t-statistic, we subtract this value from 1 to get the area to the right our our t-statistic.
1-pt(t.statistic, df=11)
## [1] 0.002113387
This number, 0.002113387, is our p-value and it means that there is roughly a 0.2% chance that we would observe a mean dexterity improvement this far or further above zero in our sample if the true mean improvement of participants was zero.
R, of course, can calcualte this for us in one step:
t.test(diff, alternative="greater")
##
## One Sample t-test
##
## data: diff
## t = 3.5923, df = 11, p-value = 0.002113
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 0.1958606 Inf
## sample estimates:
## mean of x
## 0.3916667
As always, if you want to find out more about how a function works, use the question mark “?t.test”. What if alternative hypothesis was \(H_a: \mu \neq 0\) rather than \(H_a: \mu > 0\)? Our t-statistic would be the same but our p-value would we twice as large:
t.test(diff, alternative="two.sided")
##
## One Sample t-test
##
## data: diff
## t = 3.5923, df = 11, p-value = 0.004227
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.1516924 0.6316409
## sample estimates:
## mean of x
## 0.3916667
An independent random sample is selected from an approximately normal population with an unknwon standard deviation. Find the p-value for the given set of hypotheses and T test statistic. Also determine if the null hypothesis would be rejected at \(\alpha = 0.05\)
1- pt(1.91, df=10)
## [1] 0.04260244
Since our p-value is less than \(\alpha = 0.05\), we reject the null hypothesis.
pt(-3.45, df=16)
## [1] 0.001646786
Since our p-value is less than \(\alpha = 0.05\), we reject the null hypothesis.
2*(1- pt(0.83, df=6)) #note that we double this value because this is a two-tailed test
## [1] 0.4383084
Since our p-value is greater than \(\alpha = 0.05\), we do not reject the null hypothesis.
1-pt(2.13, df=27)
## [1] 0.02121769
Since our p-value is less than \(\alpha = 0.05\), we reject the null hypothesis.
A \(95 \%\) confidence interval for a population mean, \(/mu\), is given as (18.985, 21.015). This confidence interval is based on a simple random sample of 36 observations. Calculate the sample mean and standard deivation. Assume that all conditions necessary for inference are satisfied. Use the t-distribution in any calculations.
The sample mean must be in the middle of the confidence interval, so:
(18.985+21.015)/2
## [1] 20
Since there are 36 observations, we are using a t-distribution with 35 degrees of freedom. What t-statistics fall at the 2.5% and 97.5% percentiles of this distribution (the edges of the 95% confidence interval)? We can use the qt function to find out:
qt(0.025, df=35); qt(0.975, df=35)
## [1] -2.030108
## [1] 2.030108
In other words, our confidence interval stretches form -2.0301079 standard errors below the mean to 2.0301079 above. Or, put another way, our confidence interval is 4.0602159 standard errors wide. We can therefore divide the width of our confidence interval by 4.0602159 to find the standard error.
(21.015-18.985)/4.06
## [1] 0.5
Our last step, is to calculate the sample standard deviation and remember that \(SE = \frac{\sigma}{\sqrt{n}}\) so we simply need to multiple our standard error by the square root of the number of observations to get the standard deviation.
0.5*sqrt(36)
## [1] 3
In other words, our sample values must have had an average of 20 with a standard deviation of 3.
In the textbook:
Read: 7.1.1 to 7.1.5 (p. 312 - 319) and do problems 7.4 and 7.7 (p. 357-358). For 7.4 note that \(\mu> 0.5\) is equivalent to \(\mu > \mu_0\) for our purposes and that while that number would have been used in calculating the t-statistic, it doesn’t affect the part of the calculation that you’re being asked to do. In 7.7 note that some of the calculations have been done for you, \(\bar{x}\) is the sample mean and \(s\) is the sample standard deviation.