The t Distribution and t Tests

These commands help visualize the \(t\) distribution, make \(t\) probability calculations, e.g., to calculate p-values, and run \(t\) tests for a single sample mean, for paired data, and to compare means from two independent samples. Some commands require mosaic (noted in comments) so you should install that package.

library(mosaic)

The \(t\) distribution and probability calculations

Visualizing the \(t\) distribution using plotDist():

# plotDist() is in the mosaic package
plotDist(dist='t', df=8, col="blue")

plotDist(dist='t', df=2, col="red", add=TRUE)

# add standard normal for comparison
plotDist(dist='norm', col="black", lty="dashed", add=TRUE)

Making \(t\) cummulative probability calculations using pt():

# Note that we are usually use positive $t$ values and
# are interested in upper tail probabilities.
# so use lower.tail=FALSE
T.prob <- pt(q=2.82, df=8, lower.tail=FALSE)
T.prob

## [1] 0.01124694

Making and visualizing \(t\) cummulative probability calculations using xpchisq():

# xpt() is in the mosaic package
xpt(q=2.82, df=9, lower.tail=FALSE)

## [1] 0.0100235

Finding critical \(t^*\) values using qt():

# for a 95% confidence level, we want 0.025 in the upper tail
T.crit <- qt(p=0.025, df=8, lower.tail=FALSE)
T.crit

## [1] 2.306004

Finding and visualizing critical \(t^*\) values using xqt():

# xpt() is in the mosaic package
# for a 95% confidence level, we want 0.025 in the upper tail
# same as 0.975 in the low end
xqt(p=0.975, df=8)

## [1] 2.306004

Using \(t\) tests on a single mean

# We will use the dataset "KidsFeet" from the 'mosaicData' package
# Foot measurements for 39 fourth graders
data(KidsFeet)
head(KidsFeet) #extra

##     name birthmonth birthyear length width sex biggerfoot domhand
## 1  David          5        88   24.4   8.4   B          L       R
## 2   Lars         10        87   25.4   8.8   B          L       L
## 3   Zach         12        87   24.5   9.7   B          R       R
## 4   Josh          1        88   25.2   9.8   B          L       R
## 5   Lang          2        88   25.1   8.9   B          L       R
## 6 Scotty          3        88   25.7   9.7   B          R       R

Run a \(t\)-test to get a confidence interval on a single sample mean and test the null hypothesis that \(\mu=0\) using t.test():

# t.test() runs a bit differently depending on whether mosaic is loaded or not
# one way to do it
t.test(x=KidsFeet$length, conf.level=0.95) #0.95 is default but can change

## 
##  One Sample t-test
## 
## data:  KidsFeet$length
## t = 117.18, df = 38, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  24.29597 25.15019
## sample estimates:
## mean of x 
##  24.72308

# another way
t.test(~length, data=KidsFeet)

## 
##  One Sample t-test
## 
## data:  length
## t = 117.18, df = 38, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  24.29597 25.15019
## sample estimates:
## mean of x 
##  24.72308

Test the null hypothesis that \(H_0: \mu=24\) using t.test():

t.test(x=KidsFeet$length, mu=24)

## 
##  One Sample t-test
## 
## data:  KidsFeet$length
## t = 3.4272, df = 38, p-value = 0.001479
## alternative hypothesis: true mean is not equal to 24
## 95 percent confidence interval:
##  24.29597 25.15019
## sample estimates:
## mean of x 
##  24.72308

Test the null hypothesis above with a one-sided alternative, \(H_A: \mu \gt 24\) using t.test():

t.test(x=KidsFeet$length, mu=24, alternative="greater")

## 
##  One Sample t-test
## 
## data:  KidsFeet$length
## t = 3.4272, df = 38, p-value = 0.0007397
## alternative hypothesis: true mean is greater than 24
## 95 percent confidence interval:
##  24.36737      Inf
## sample estimates:
## mean of x 
##  24.72308

Comparing paired means using a \(t\)-test

Paired data are data in which two variables are measured on the same individuals, such as before vs. after some treatment. Though it does not really make sense, we can compare the width of kids’ feet to their lengths. Data hear are paired because these are the same feet on the same individual kids.

Compare two paired means using t.test():

# t.test() runs a bit differently depending on whether mosaic is loaded or not
t.test(x=KidsFeet$length, y=KidsFeet$width, paired=TRUE)

## 
##  Paired t-test
## 
## data:  KidsFeet$length and KidsFeet$width
## t = 92.219, df = 38, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  15.38545 16.07609
## sample estimates:
## mean of the differences 
##                15.73077

Comparing two independent means using a \(t\)-test

We can compare kids whose Left foot is bigger to those whose Right foot was bigger. First let’s use a figure. Here black data points are jittered, red point is the mean and red bar is the standard error.

ggplot(KidsFeet, aes(x=biggerfoot, y=length)) +
  geom_jitter(width=0.1) +
  stat_summary(fun.data="mean_se", col="red")

Perform a \(t\)-test to test the null hypothesis \(H_0: \mu_{Lbig}=\mu_{Rbig}\) against the altermative \(H_A: \mu_{Lbig} \ne \mu_{Rbig}\) using t.test():

# t.test() runs a bit differently depending on whether mosaic is loaded or not
t.test(length~biggerfoot, data=KidsFeet)

## 
##  Welch Two Sample t-test
## 
## data:  length by biggerfoot
## t = 2.1122, df = 31.734, p-value = 0.04264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.03088997 1.71937741
## sample estimates:
## mean in group L mean in group R 
##        25.10455        24.22941

Note that when comparing two means the confidence interval is on the difference between means.