7.5 - the t distribution

Motivation

  • We will continue to consider properties of a sample \(Y_1,Y_2,...,Y_n\) drawn i.i.d. from a \(N(\mu,\sigma^2)\) population
  • One important sampling distribution that arises from normal samples is the t-distribution

A bit of history

  • The t-distribution is sometimes referred to as Student’s t-distribution
  • Named for Student, the pseudonym used by William Sealy Gosset who published under this name while working at the Guiness brewery in Dublin
  • Interested in properties of small samples when drawn from normal populations

Gosset

Image source: Wikipedia

The t-distribution pdf

  • A random variable \(T\) is said to follow a t distribution with \(\nu\) degrees-of-freedom, i.e. \(T\sim t_\nu\), if:

\[f_T(t) = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}, -\infty < t < \infty\]

Plots of the t-distribution

  • Symmetric about zero, bell-shaped
  • As \(\nu\rightarrow \infty\), \(t_\nu \rightarrow N(0,1)\)
t distributions with various df and the standard normal distribution

Where does it come from?

  • Suppose:
    • \(Z\sim N(0,1)\)
    • \(W \sim \chi^2_\nu\)
    • \(Z\perp\!\!\!\perp W\)
  • Then:

\[T = \frac{Z}{\sqrt{W/\nu}} \sim t_\nu\]

  • To prove:
    • Carry out \(2\rightarrow 2\) transformation of \((Z,W) \rightarrow (T,U)\)
    • Integrate out \(U\)

Joint distribution of \((Z,W)\)

Since \(Z\) and \(W\) are independent:

\[ f_{Z,W}(z,w) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2} \cdot \frac{1}{2^{\nu/2}\Gamma(\nu/2)} w^{\nu/2-1} e^{-w/2}, \quad w>0, z \in \mathbb{R} \]

Transformation \((Z,W) \rightarrow (T,U)\)

  • Let \(T = \frac{Z}{\sqrt{W/\nu}}\), \(U =W\)
  • Joint support: \(T \in \mathbb{R}\),\(U>0\)
  • Inverses:

\[Z = T\sqrt{\frac{U}{\nu}}\]

\[W = U\]

  • Jacobian:

\[J = \begin{bmatrix} \sqrt{\frac{U}{\nu}}& 0\\ \frac{T}{2\sqrt{U\nu}} & 1 \end{bmatrix} \Rightarrow |det(J)| = \sqrt{\frac{U}{\nu}}\]

Joint pdf of \((T,U)\)

\[f_{T,U}(t,u) = f_{Z,W}\left(t\sqrt{\frac{u}{\nu}},u\right)\cdot \sqrt{\frac{u}{\nu}}\]

\[f_{T,U}(t,u) = \frac{1}{\sqrt{2\pi}} \exp\!\left(-\frac{t^2u}{2\nu}\right) \cdot \frac{1}{2^{\nu/2}\Gamma(\nu/2)} u^{\nu/2-1} e^{-u/2} \cdot \sqrt{\frac{u}{\nu}}\] \[= \frac{1}{\sqrt{2\pi\nu}\,2^{\nu/2}\Gamma(\nu/2)} u^{(\nu+1)/2-1} \exp\!\left(-\frac{u}{2}\left(1+\frac{t^2}{\nu}\right)\right), t \in \mathbb{R}, u > 0\]

Marginal of \(T\)

\[f_T(t) = \int_{0}^{\infty}\frac{1}{\sqrt{2\pi\nu}\,2^{\nu/2}\Gamma(\nu/2)} u^{(\nu+1)/2-1} \exp\!\left(-\frac{u}{2}\left(1+\frac{t^2}{\nu}\right)\right)\, du= \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}\]

Proof: practice!

Application to normal samples

  • Consider a sample \(Y_1,Y_2,...,Y_n\) drawn i.i.d. from a \(N(\mu,\sigma^2)\) population
  • Then:

\[\frac{\bar Y - \mu}{S/\sqrt{n}} \sim t_{n-1}\]

Outline of proof (rest left for practice):

\[\frac{\bar Y - \mu}{S/\sqrt{n}} = \frac{\frac{\bar Y - \mu}{\sigma/\sqrt{n}}}{S/\sigma}\]

t distribution in R

  • dt(x, df): evaluate \(f_T(x)\)
  • pt(x, df): evaluate cumulative probabilities
  • qt(x, df): find quantiles

Application: 95% confidence interval for \(\mu\)

  • Consider a sample of size \(n\) drawn i.i.d. from a normal population with mean \(\mu\)
  • With \(q_{0.025}\) and \(q_{0.975}\) defined as shown (use qt(0.025, n-1) and qt(0.975, n-1)):

t distribution with quantiles

\[\small 0.95 = P\left(q_{0.025} \le \frac{\bar Y-\mu}{S/ \sqrt{n}} \le q_{0.975} \right) = P\left(\bar Y - q_{0.025}\cdot \frac{S}{\sqrt{n}} \ge \mu \ge \bar Y - q_{0.975}\cdot \frac{S}{\sqrt{n}} \right)\]

Since the t-distribution is symmetric, \(q_{0.025} = -q_{0.975}\) so endpoints often expressed as:

\[\small \bar Y \pm q_{0.975}\cdot S/\sqrt{n}\]

Application: paired t-test

  • The t-distribution is often used to test hypotheses about differences in matched/paired observations:

\[H_0: \mu_d =0\] \[H_a: \mu_d \ne 0\]

  • If \(H_0\) is true, and assuming the individual differences are normally distributed:

\[ \frac{\bar Y_d-0}{S/ \sqrt{n}}\sim t_{n-1}\]

  • p-value = \(2\times P\left(T> \left|\frac{\bar Y_d-0}{S/ \sqrt{n}}\right|\right)\)