Each lab in this course will have multiple components. First, there will a piece like the document below, which includes instructions, tutorials, and problems to be addressed in your write-up. Any part of the document below marked with a star, \(\star\), is a problem for your write-up. Second, there will be an R script where you will do all of the computations required for the lab. And third, you will complete a write-up to be turned in the Friday that you did your lab.
If you are comfortable doing so, I strongly suggest using RMarkdown to type your lab write-up. However, if you are new to R, you may handwrite your write-up (I’m also happy to work with you to learn RMarkdown!). All of your computational work will be done in RStudio Cloud, and both your lab write-up and your R script will be considered when grading your work.
In this lab, work with different families of distributions and visualize the distributions of their order statistics. Then, you will use a dataset to make qualitative comparisons between plots of distributions and qq-plots.
There’s not a really nice way built in to R to plot order statistics. Instead, we will write our own function that plots them.
os.pdf <- function(x, n, i, cdf, pdf, ...) {
c <- factorial(n)/(factorial(i - 1) * factorial(n - i))
F <- sapply(x, cdf, ...)
f <- sapply(x, pdf, ...)
c * (F^(i - 1)) * ((1 - F)^(n - i)) * f
}
The function os.pdf takes as an input the following variables:
To actually use this function to plot the pdfs of order statistics, we use ggplot along with the stat_function layer.
ggplot(data.frame(x=c(0,25)),aes(x=x)) +
stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 1, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x1")) +
stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 2, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x2")) +
stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 3, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x3")) +
stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 4, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x4")) +
stat_function(fun = os.pdf, geom = "line", args = list(n = 5, i = 5, cdf = pexp, pdf = dexp, rate = 0.1), aes(col = "x5")) +
scale_colour_manual("Order Statistic", values = c("red", "purple", "blue", "green", "yellow"))
(\(\star\)) Plot the pdfs of n=5 order statistics for common distributions other than the exponential distribution. Include the plots in your write-up.
(\(\star\)) What observations do you make about the shapes of the pdfs of the order statistics and the relationships between them?
In R, qq-plots are easily made using the ggplot2 package. As an example, here’s how you recreate the first plot we saw in class.
ggplot() +
geom_qq(aes(sample = rnorm(99)))
The code above samples 99 points from a standard normal random variable, and then plots them against the theoretical quantiles from the standard normal distribution (the default in geom_qq). Read the documentation of geom_qq for more information.
We will now use a dataset that’s built into R to gain some insight into the relationships between qq-plots and distributions. In your R script, you’ll load the dataset “faithful”, which has 272 observations of the eruptions of the Old Faithful geyser in Yellowstone National Park. The dataset consists of two variables:
(\(\star\)) Plot the densities of each variable in the faithful dataset, and include the plots in your write-up.
(\(\star\)) Plot the qq-plots of each variable in the faithful dataset, and include the plots in your write-up.
(\(\star\)) Explain how the shapes of the densities determine the shapes of the qq-plots, and vice versa. In other words, if you only had one, how could you (roughly) figure out the shape of the other?