General Instructions

Each lab in this course will have multiple components. First, there will a piece like the document below, which includes instructions, tutorials, and problems to be addressed in your write-up. Any part of the document below marked with a star, \(\star\), is a problem for your write-up. Second, there will be an R script where you will do all of the computations required for the lab. And third, you will complete a write-up to be turned in the Friday that you did your lab.

If you are comfortable doing so, I strongly suggest using RMarkdown to type your lab write-up. However, if you are new to R, you may handwrite your write-up (I’m also happy to work with you to learn RMarkdown!). All of your computational work will be done in RStudio Cloud, and both your lab write-up and your R script will be considered when grading your work.

Lab Overview

In this lab, work with different families of distributions and visualize the distributions of their order statistics. Then, you will use a dataset to make qualitative comparisons between plots of distributions and qq-plots.

Order Statistics

There’s not a really nice way built in to R to plot order statistics. Instead, we will write our own function that plots them.

os.pdf <- function(x, n, i, cdf, pdf, ...) {
        c <- factorial(n)/(factorial(i - 1) * factorial(n - i))
        F <- sapply(x, cdf, ...)
        f <- sapply(x, pdf, ...)
        c * (F^(i - 1)) * ((1 - F)^(n - i)) * f
}

The function os.pdf takes as an input the following variables:

To actually use this function to plot the pdfs of order statistics, we use ggplot along with the stat_function layer.

ggplot(data.frame(x=c(0,25)),aes(x=x)) +
        stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 1, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x1")) +
        stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 2, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x2")) +
        stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 3, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x3")) +
        stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 4, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x4")) +
        stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 5, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x5")) +
        scale_colour_manual("Order Statistic", values = c("red", "purple", "blue", "green", "yellow"))

  1. (\(\star\)) Plot the pdfs of n=5 order statistics for common distributions other than the exponential distribution. Include the plots in your write-up.

  2. (\(\star\)) What observations do you make about the shapes of the pdfs of the order statistics and the relationships between them?

QQ-Plots

In R, qq-plots are easily made using the ggplot2 package. As an example, here’s how you recreate the first plot we saw in class.

ggplot() + 
        geom_qq(aes(sample = rnorm(99)))

The code above samples 99 points from a standard normal random variable, and then plots them against the theoretical quantiles from the standard normal distribution (the default in geom_qq). Read the documentation of geom_qq for more information.

We will now use a dataset that’s built into R to gain some insight into the relationships between qq-plots and distributions. In your R script, you’ll load the dataset “faithful”, which has 272 observations of the eruptions of the Old Faithful geyser in Yellowstone National Park. The dataset consists of two variables:

  1. (\(\star\)) Plot the densities of each variable in the faithful dataset, and include the plots in your write-up.

  2. (\(\star\)) Plot the qq-plots of each variable in the faithful dataset, and include the plots in your write-up.

  3. (\(\star\)) Explain how the shapes of the densities determine the shapes of the qq-plots, and vice versa. In other words, if you only had one, how could you (roughly) figure out the shape of the other?