Statistics Dot Com

The wider shoulders, fatter tails of the t-distribution.

In the discussion it was noted that the t-distribution will have fatter tails and different shoulders than the standard normal distibution. Intuitively, we know that estimating \( \sigma \) by s (the sample standard deviation) that as s can be larger that \( \sigma \) we will have more probability that the value of \( T \) is small and as \( s \) can be smaller than \( \sigma \) we have \( T \) will be bigger more often,

Okay, there are many ways to visualize this using the basic EDA techinques.

Boxplots

Boxplots are a great tool to show center, spread, skew and tails. Here is a basic graphic to see that the tails are longer for small degrees of freedom. I'll use random samples:

l <- list(df3 = rt(1000, df = 3), df10 = rt(1000, df = 10), df50 = rt(1000, 
    df = 50), df100 = rt(1000, df = 100), norm = rnorm(1000))
boxplot(l)

plot of chunk unnamed-chunk-1

As the degrees of freedom get larger, the outliers start to look more normal. Most books basically say use the normal after 30 or 100. (Sort of depends on the amount of ink they want to burn on printing the tables.) We can see by 100 it isn't so far off, but they do look a bit different.

Density plots

Histograms are familiar graphs to see the shape, center, and spread of a distribution, but don't really lend themselves to comparisons. For that density plots work better, as you can put one on top of another. R has the density function to make density estimates of a distribution. Here we work instead with the “d” functions – which give the mathematical densities. The curve function can help us make these graphics:

curve(dnorm(x), -4, 4, col = "red")
curve(dt(x, df = 3), add = TRUE)

plot of chunk unnamed-chunk-2

This graph shows the red curve (the normal) having tails that head to 0 faster than the t distribution. This is the “fat” tails bit.

QQplots

The quantile-quantile plot compares quantiles. This is actually a perfect graph for us. Typically we compare a random sample to a theoretical one (the normal). This is done with:

qqnorm(rt(1000, df = 3))

plot of chunk unnamed-chunk-3

The distribution of the random sample is approximately normal if the graphic is “close” to a straight line. This clearly isn't. T

Not that this graphic should be close:

qqnorm(rt(1000, df = 100))

plot of chunk unnamed-chunk-4

The above graphics have a bit of randomness involved, so aren't “mathematically” precies. No worries. We have R ready to tell us the theoretical distribution, so we can compare these. The basic idea is to a) find the quantiles and then b) plot on a scatterplot. Here we go:

df <- 3
qs <- seq(0.01, 0.99, by = 0.01)
plot(qt(qs, df = df), qnorm(qs))
abline(a = 0, b = 1)

plot of chunk unnamed-chunk-5

If we increase the degrees of freedom our graph changes

df <- 100
qs <- seq(0.01, 0.99, by = 0.01)
plot(qt(qs, df = df), qnorm(qs))
abline(a = 0, b = 1)

plot of chunk unnamed-chunk-6

The curve can be read to see if the distribution has longer or shorter tails, but we won't go there.

Shoulders

The t-distribution is more concentrated around 0 in some sense. Not the obvious one. It is unlikely that values are close to 0, as the probability moves to the tails. But if you know the value is close to 0 then the the t is more likely to be near 0. This graphic shows the conditional distributions of the values (given a value is in (-a,a) what is the shape). Here we see the \( t \) more concentrated (it gets clipped as we didn't fuss with the ylimits).

df <- 3
a <- 0.5
cond_norm_dist <- function(x) dnorm(x)/(pnorm(a) - pnorm(-a))
cond_t_dist <- function(x) dt(x, df = df)/(pt(a, df = df) - pt(-a, 
    df = df))
curve(cond_norm_dist(x), -a, a, col = "red")
curve(cond_t_dist(x), add = TRUE)

plot of chunk unnamed-chunk-7

I'm not stock market junkie, but I recall reading that stock returns (after taking a log) demonstrate similar features compared to the normal: longer tails and when small more concentrated on small changes.